@sogni-ai/sogni-creative-agent-skill 3.3.1 โ†’ 3.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -489,7 +489,7 @@ sogni-agent --persona-list
489
489
  sogni-agent --persona-remove "Mark"
490
490
  ```
491
491
 
492
- Stored at `~/.config/sogni/personas/`. Pronouns like "me" / "myself" auto-resolve to the `self` persona; "my wife" resolves to `partner`, etc.
492
+ Stored at `~/.config/sogni/personas/`. Personas resolve by explicit saved name, id, or tag/alias; relationship phrases are not treated as persona identifiers.
493
493
 
494
494
  ### Memory (persistent preferences)
495
495
 
@@ -561,13 +561,13 @@ Options cycle sequentially per image. Without `{...}` syntax, `-n` produces mult
561
561
 
562
562
  ## Token Auto-Fallback
563
563
 
564
- Use `--token-type auto` to retry with SOGNI tokens when SPARK is insufficient:
564
+ Use `--token-type auto` to retry native Sogni models with SOGNI tokens when SPARK is insufficient:
565
565
 
566
566
  ```bash
567
567
  sogni-agent --token-type auto "a dragon eating tacos"
568
568
  ```
569
569
 
570
- Tries SPARK first (free daily tokens), then falls back to SOGNI if the balance is too low.
570
+ Tries SPARK first (free daily tokens), then falls back to SOGNI if the balance is too low. Vendor models such as Seedance and GPT Image 2 require Premium Spark eligibility and never use SOGNI fallback.
571
571
 
572
572
  ---
573
573
 
package/SKILL.md CHANGED
@@ -2,7 +2,7 @@
2
2
  name: sogni-creative-agent-skill
3
3
  description: "Sogni Creative Agent Skill: agent skill and CLI for image, video, and music generation using Sogni AI's decentralized GPU network. Supports personas (named people with saved reference photos and voice clips), persistent memories (user preferences across sessions), custom personality, style transfer, angle synthesis, and multi-step creative workflows. Ask the agent to \"draw\", \"generate\", \"create an image\", \"make a video/animate\", \"make music\", \"apply a style\", or \"generate me as a superhero\"."
4
4
  metadata:
5
- version: "3.1.1"
5
+ version: "3.3.3"
6
6
  homepage: https://sogni.ai
7
7
  clawdbot:
8
8
  emoji: "๐ŸŽจ"
@@ -165,7 +165,7 @@ sogni-agent -Q pro "a cat wearing a hat" # flux2_dev, 40 steps, 1024x1024 (
165
165
  sogni-agent -n 3 "a {red|blue|green} sports car"
166
166
  # โ†’ generates "a red sports car", "a blue sports car", "a green sports car"
167
167
 
168
- # Token auto-fallback (tries SPARK, falls back to SOGNI)
168
+ # Token auto-fallback for native Sogni models (tries SPARK, falls back to SOGNI)
169
169
  sogni-agent --token-type auto "a cat wearing a hat"
170
170
 
171
171
  # Save to file
@@ -732,7 +732,7 @@ For **any transition video work**, always use the **Sogni skill/plugin** (not ra
732
732
 
733
733
  ### Insufficient Funds Handling
734
734
 
735
- Use `--token-type auto` to automatically retry with SOGNI tokens when SPARK is insufficient.
735
+ Use `--token-type auto` to automatically retry native Sogni models with SOGNI tokens when SPARK is insufficient. Vendor models such as Seedance and GPT Image 2 require Premium Spark eligibility and never fall back to SOGNI.
736
736
 
737
737
  When you see **"Debit Error: Insufficient funds"** even with auto-fallback, reply:
738
738
 
@@ -951,7 +951,7 @@ sogni-agent -q --video --ref /path/to/image.png -m ltx23-22b-fp8_i2v_distilled -
951
951
  # Photobooth: stylize a face photo
952
952
  sogni-agent -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"
953
953
 
954
- # Token auto-fallback (tries SPARK first, retries with SOGNI on insufficient balance)
954
+ # Token auto-fallback for native Sogni models (tries SPARK first, retries with SOGNI on insufficient balance)
955
955
  sogni-agent -q --token-type auto -o /tmp/generated.png "user's prompt"
956
956
 
957
957
  # Check current SPARK/SOGNI balances (no prompt required)
@@ -1086,7 +1086,7 @@ Balance check example (`--json --balance`):
1086
1086
 
1087
1087
  ## Cost
1088
1088
 
1089
- Uses Spark tokens from your Sogni account. 512x512 images are most cost-efficient. Use `--token-type auto` to automatically fall back to SOGNI tokens when SPARK is insufficient.
1089
+ Uses Spark tokens from your Sogni account. 512x512 images are most cost-efficient. Use `--token-type auto` to automatically fall back to SOGNI tokens for native Sogni models when SPARK is insufficient. Seedance and GPT Image 2 are vendor models and require Premium Spark eligibility; they never use SOGNI fallback.
1090
1090
 
1091
1091
  ## Persona System
1092
1092
 
@@ -1116,19 +1116,13 @@ sogni-agent --persona-remove "Mark"
1116
1116
 
1117
1117
  ### Persona Pipeline Rules
1118
1118
 
1119
- When a user mentions a persona (by name, tag, or pronoun):
1119
+ When a user mentions a persona by explicit saved name, id, or tag/alias:
1120
1120
 
1121
1121
  1. **For images:** Use `--persona "Name" "prompt"` which auto-injects the persona's reference photo as context and selects the Qwen editing model
1122
1122
  2. **For video with voice cloning:** The persona's voice clip is used as `--reference-audio-identity` when `--video` is combined with `--persona`
1123
1123
  3. **For video without voice clip:** Describe the voice in the prompt ("speaks in a warm alto with a British accent")
1124
1124
 
1125
- **Pronoun matching:**
1126
- - "me" / "myself" / "I" โ†’ persona with `relationship: self`
1127
- - "my wife" / "my husband" / "my partner" โ†’ persona with `relationship: partner`
1128
- - "my son" / "my daughter" / "my kid" โ†’ persona with `relationship: child`
1129
- - "my dog" / "my cat" / "my pet" โ†’ persona with `relationship: pet`
1130
-
1131
- **Important:** User-uploaded photos are NOT personas. Only use `--persona` when referring to a saved persona by name or pronoun. For ad-hoc photos, use `-c` (context image) directly.
1125
+ **Important:** User-uploaded photos are NOT personas. Only use `--persona` when referring to a saved persona by explicit name, id, or tag/alias. For ad-hoc photos, use `-c` (context image) directly.
1132
1126
 
1133
1127
  ## Memory System
1134
1128
 
@@ -314,12 +314,12 @@ const GATING_POLICIES = [
314
314
  "trigger": {
315
315
  "allOf": [
316
316
  "has_active_persona",
317
- "requests_video_generation",
317
+ "requests_persona_video_generation",
318
318
  "no_persona_image_in_session"
319
319
  ],
320
320
  "sources": {
321
321
  "has_active_persona": "session_state",
322
- "requests_video_generation": "planner",
322
+ "requests_persona_video_generation": "planner",
323
323
  "no_persona_image_in_session": "session_state"
324
324
  }
325
325
  },
@@ -2160,12 +2160,12 @@ const PROMPT_CONTRACTS = [
2160
2160
  "contractId": "animate_photo_v1",
2161
2161
  "version": "1.0.0",
2162
2162
  "toolName": "animate_photo",
2163
- "baseDescription": "animate_photo produces video from one or more source images using LTX 2.3.\n\nVIDEO PROMPT QUOTING: In video prompts, ONLY use double quotes for spoken dialogue.\nSpeaker tags are allowed outside the quotes for screenplay-style dialogue, e.g.\nCHARACTER: \"We made it.\" Never put on-screen text, overlay text, titles, captions, signs,\nwatermarks, or any visual text in quotes โ€” describe them without quotes (e.g. bold white text\nreading CONGRATULATIONS overlays the lower third). Quotes signal speech to the model;\nquoting non-speech text confuses audio generation.\n\nDIALOGUE DURATION: Spoken dialogue in video prompts must fit the clip duration. Estimate\nat 2.5 words per second for natural cinematic delivery, plus ~1 second per acting beat\n(pauses, gestures, glances between lines). If the user did NOT explicitly request a specific\nduration (using default 5s), extend the duration to fit the dialogue (max 20s). If the user\nexplicitly requested a specific duration, condense the dialogue to fit while preserving meaning.\nAlways check: total dialogue words รท 2.5 + beat count โ‰ค clip duration.\n\nLATEST GENERATED IMAGE FOLLOW-UP: When the newest user turn asks to animate, make a video,\nor make a clip from a generated image/result (for example \"the apple\", \"this one\",\n\"the latest image\"), use animate_photo with that latest generated image. Do not inherit an\nolder Seedance model, resolution, or duration from an unrelated prior turn unless the newest\nuser turn explicitly says Seedance or confirms an immediately suggested Seedance video stage.\nLTX supports exact 2-20s durations, so honor requests like 3s exactly.\n\nWORD BUDGET PER CLIP: The handler REJECTS clips whose spoken dialogue exceeds the budget\nโ€” there is NO auto-trim, so plan dialogue lengths up-front. Hard maximum is 3.75 spoken\nwords per second. Ceilings: 5s = 18 words, 6s = 22 words, 8s = 30 words, 10s = 37 words,\n15s = 56 words, 20s = 75 words. Aim below these ceilings. If a scene's dialogue won't fit,\ntighten the lines, raise the per-clip duration, or split into two segments โ€” do NOT submit\nand hope it works. Spoken words inside double quotes count toward the budget; speaker tags\nand visual/action prose are free.\n\nBATCH VIDEO PER-CLIP DURATION: For a multi-segment animate_photo batch\n(sourceImageIndices + prompts) when the user states a TOTAL video length but NO per-clip\nlength, target 15 seconds per clip when dialogue is involved, and pass that duration\nexplicitly. Example: 60s total โ†’ 4 segments ร— 15s, NOT 6ร—10s or 12ร—5s. There is NO 3-clip\nbatch cap: sourceImageIndices supports up to 16 clips, so never split one planned batch into\n\"first 3\" and \"remaining clips\" calls. Do NOT split a planned 15s dialogue scene into multiple\nshorter clips just because a retry complains about word budget; keep duration=15 and tighten\nthe line. Use 5s clips only for single short motion beats or one very short spoken phrase.\nIf the user explicitly specifies a per-clip duration, honor that instead.\n\nN-VERSIONS-OF-A-VIDEO PATTERN: NEVER call animate_photo N times sequentially โ€” ALWAYS\nuse sourceImageIndices in ONE call so all N projects run in parallel. Two flavors:\n(A) SHARED CONTENT โ€” one edit_image/generate_image call with numberOfVariations=N + {|}\nDynamic Prompts to make N distinct source images, then ONE animate_photo call with\nsourceImageIndices=[start..start+N-1] and a single shared prompt.\n(B) PER-CLIP CONTENT โ€” when each clip has DIFFERENT dialogue, jokes, narration, or motion,\npass BOTH sourceImageIndices AND prompts (array of N strings, one per clip) in the SAME\nsingle animate_photo call. The top-level prompt is still required โ€” pass a brief batch summary.\n\nCRITICAL: sourceImageIndices values MUST be read from the latest edit_image/generate_image\ntool result's startIndex field โ€” if startIndex=3 and 4 images were generated, pass\nsourceImageIndices=[3,4,5,6], NOT [0,1,2,3]. Negative indices refer to uploaded images:\n-1 first upload, -2 second upload, -3 third upload. Use repeated -1 entries only when\nintentionally reusing the primary uploaded image. When prompts is supplied, prompts.length\nMUST equal sourceImageIndices.length.\n\nSEEDANCE UPLOADED STORYBOARD DEFAULT: If the user uploaded a storyboard, shot sheet,\nor visual trailer board and asks to make a trailer/video/movie/clip from it, do NOT use\nanimate_photo on the board image and do NOT split it into four LTX clips. Use generate_video\nwith Seedance referenceImageIndices for one continuous clip unless the user explicitly asks\nfor separate LTX clips or first-frame/last-frame animation.\n\nSCREENPLAY / STORYBOARD ANIMATE RULE: For full storyboard projects, use one\nanimate_photo batch with sourceImageIndices + prompts so each clip keeps its own exact\nscene text, stable cast anchors, and screenplay-style speaker-tagged dialogue, and all video\nclips render in parallel. Every speaking clip's video prompt must include that clip's actual\nquoted dialogue, not placeholders such as \"while speaking\", \"dialogue begins\", \"explaining\",\nor \"final line lands\". If each generated scene keyframe should be both the first and last frame\nof its own stitched segment, call animate_photo with sourceImageIndices=[start..end],\nframeRole=\"both\", prompts=[...], and OMIT endImageIndex/endImageIndices so the handler\nuses each source as its own end frame.\n\nUPLOADED REFERENCE LOOPED SKITS: When the user supplies one uploaded reference image and\nasks for several scripted/storyboard/dialogue segments to reuse that same image as BOTH the\nfirst frame and last frame of each segment before stitching, do it in ONE animate_photo call:\nsourceImageIndices=[-1,-1,...], frameRole=\"both\", endImageIndex=-1 (or matching\nendImageIndices=[-1,-1,...]), duration equal to the requested per-segment duration, and\nprompts=[one full scene prompt per segment]. Each prompt must preserve the exact screenplay\nspeaker tags and quoted dialogue from that scene, e.g. HOST: \"...\" GUEST: \"...\". Do not\ndrop speaker tags, convert them to generic narration, omit the last-frame contract, analyze\nthe image first, generate new keyframes first, or split the batch into serial calls. After\nthe single animate_photo batch completes, call stitch_video with the returned video indices.\n\nFor adjacent transition chains: N images create N-1 clips โ€” call animate_photo with\nframeRole=\"both\", sourceImageIndices=[start..end-1], endImageIndices=[start+1..end],\nprompts=[one transition prompt per adjacent pair], then stitch_video. If 5 uploaded images\nare the keyframe sequence, use sourceImageIndices=[-1,-2,-3,-4],\nendImageIndices=[-2,-3,-4,-5], frameRole=\"both\", prompts length 4, then stitch_video.\nDo NOT set endImageIndex=-1 in generated-keyframe patterns โ€” that means every clip ends\non the primary uploaded image.\n\nUPLOADED FIRST-FRAME/LAST-FRAME TRANSITION CHAINS: If the user uploads multiple images\nand asks for a video that transitions from image to image, changes country/version every\nN seconds, or says to use first-frame/last-frame for each pair, call animate_photo directly.\nDo not call edit_image, generate_image, analyze_image, or map_assets_for_model first โ€” the\nuploaded images are already the keyframes. For N uploaded images, create N-1 adjacent clips\nunless the user explicitly asks for a loop back to the first image. Use per-clip duration\nfrom \"every N seconds\" when present; otherwise divide the requested total by the number of\nadjacent clips. After animate_photo returns the batch videos, always call stitch_video with\nthose video indices before finalizing.",
2163
+ "baseDescription": "animate_photo produces video from one or more source images using LTX 2.3.\n\nVIDEO PROMPT QUOTING: In video prompts, ONLY use double quotes for spoken dialogue.\nSpeaker tags are allowed outside the quotes for screenplay-style dialogue, e.g.\nCHARACTER: \"We made it.\" Never put on-screen text, overlay text, titles, captions, signs,\nwatermarks, or any visual text in quotes โ€” describe them without quotes (e.g. bold white text\nreading CONGRATULATIONS overlays the lower third). Quotes signal speech to the model;\nquoting non-speech text confuses audio generation.\n\nDIALOGUE DURATION: Spoken dialogue in video prompts must fit the clip duration. Estimate\nat 2.5 words per second for natural cinematic delivery, plus ~1 second per acting beat\n(pauses, gestures, glances between lines). If the user did NOT explicitly request a specific\nduration (using default 5s), extend the duration to fit the dialogue (max 20s). If the user\nexplicitly requested a specific duration, condense the dialogue to fit while preserving meaning.\nAlways check: total dialogue words รท 2.5 + beat count โ‰ค clip duration.\n\nLATEST GENERATED IMAGE FOLLOW-UP: When the newest user turn asks to animate, make a video,\nor make a clip from a generated image/result (for example \"the apple\", \"this one\",\n\"the latest image\"), use animate_photo with that latest generated image. Do not inherit an\nolder Seedance model, resolution, or duration from an unrelated prior turn unless the newest\nuser turn explicitly says Seedance or confirms an immediately suggested Seedance video stage.\nLTX supports exact 2-20s durations, so honor requests like 3s exactly.\n\nWORD BUDGET PER CLIP: The handler REJECTS clips whose spoken dialogue exceeds the budget\nโ€” there is NO auto-trim, so plan dialogue lengths up-front. Hard maximum is 3.75 spoken\nwords per second. Ceilings: 5s = 18 words, 6s = 22 words, 8s = 30 words, 10s = 37 words,\n15s = 56 words, 20s = 75 words. Aim below these ceilings. If a scene's dialogue won't fit,\ntighten the lines, raise the per-clip duration, or split into two segments โ€” do NOT submit\nand hope it works. Spoken words inside double quotes count toward the budget; speaker tags\nand visual/action prose are free.\n\nBATCH VIDEO PER-CLIP DURATION: For a multi-segment animate_photo batch\n(sourceImageIndices + prompts) when the user states a TOTAL video length but NO per-clip\nlength, target 15 seconds per clip when dialogue is involved, and pass that duration\nexplicitly. Example: 60s total โ†’ 4 segments ร— 15s, NOT 6ร—10s or 12ร—5s. There is NO 3-clip\nbatch cap: sourceImageIndices supports up to 16 clips, so never split one planned batch into\n\"first 3\" and \"remaining clips\" calls. Do NOT split a planned 15s dialogue scene into multiple\nshorter clips just because a retry complains about word budget; keep duration=15 and tighten\nthe line. Use 5s clips only for single short motion beats or one very short spoken phrase.\nIf the user explicitly specifies a per-clip duration, honor that instead.\n\nN-VERSIONS-OF-A-VIDEO PATTERN: NEVER call animate_photo N times sequentially โ€” ALWAYS\nuse sourceImageIndices in ONE call so all N projects run in parallel. Two flavors:\n(A) SHARED CONTENT โ€” one edit_image/generate_image call with numberOfVariations=N + {|}\nDynamic Prompts to make N distinct source images, then ONE animate_photo call with\nsourceImageIndices=[start..start+N-1] and a single shared prompt.\n(B) PER-CLIP CONTENT โ€” when each clip has DIFFERENT dialogue, jokes, narration, or motion,\npass BOTH sourceImageIndices AND prompts (array of N strings, one per clip) in the SAME\nsingle animate_photo call. The top-level prompt is still required โ€” pass a brief batch summary.\nFor explicit last/end-frame-only batches, reuse the image through sourceImageIndices but set\nframeRole=\"end\" and omit endImageIndex/endImageIndices. This means each listed image is the\nlast frame for its corresponding clip and no first/start frame is supplied.\n\nCRITICAL: sourceImageIndices values MUST be read from the latest edit_image/generate_image\ntool result's startIndex field โ€” if startIndex=3 and 4 images were generated, pass\nsourceImageIndices=[3,4,5,6], NOT [0,1,2,3]. Negative indices refer to uploaded images:\n-1 first upload, -2 second upload, -3 third upload. Use repeated -1 entries only when\nintentionally reusing the primary uploaded image. When prompts is supplied, prompts.length\nMUST equal sourceImageIndices.length.\n\nSEEDANCE UPLOADED STORYBOARD DEFAULT: If the user uploaded a storyboard, shot sheet,\nor visual trailer board and asks to make a trailer/video/movie/clip from it, do NOT use\nanimate_photo on the board image and do NOT split it into four LTX clips. Use generate_video\nwith Seedance referenceImageIndices for one continuous clip unless the user explicitly asks\nfor separate LTX clips or first-frame/last-frame animation.\n\nSCREENPLAY / STORYBOARD ANIMATE RULE: For full storyboard projects, use one\nanimate_photo batch with sourceImageIndices + prompts so each clip keeps its own exact\nscene text, stable cast anchors, and screenplay-style speaker-tagged dialogue, and all video\nclips render in parallel. Every speaking clip's video prompt must include that clip's actual\nquoted dialogue, not placeholders such as \"while speaking\", \"dialogue begins\", \"explaining\",\nor \"final line lands\". If each generated scene keyframe should be both the first and last frame\nof its own stitched segment, call animate_photo with sourceImageIndices=[start..end],\nframeRole=\"both\", prompts=[...], and OMIT endImageIndex/endImageIndices so the handler\nuses each source as its own end frame.\n\nUPLOADED REFERENCE LOOPED SKITS: When the user supplies one uploaded reference image and\nasks for several scripted/storyboard/dialogue segments to reuse that same image as BOTH the\nfirst frame and last frame of each segment before stitching, do it in ONE animate_photo call:\nsourceImageIndices=[-1,-1,...], frameRole=\"both\", endImageIndex=-1 (or matching\nendImageIndices=[-1,-1,...]), duration equal to the requested per-segment duration, and\nprompts=[one full scene prompt per segment]. Each prompt must preserve the exact screenplay\nspeaker tags and quoted dialogue from that scene, e.g. HOST: \"...\" GUEST: \"...\". Do not\ndrop speaker tags, convert them to generic narration, omit the last-frame contract, analyze\nthe image first, generate new keyframes first, or split the batch into serial calls. After\nthe single animate_photo batch completes, call stitch_video with the returned video indices.\n\nFor adjacent transition chains: N images create N-1 clips โ€” call animate_photo with\nframeRole=\"both\", sourceImageIndices=[start..end-1], endImageIndices=[start+1..end],\nprompts=[one transition prompt per adjacent pair], then stitch_video. If 5 uploaded images\nare the keyframe sequence, use sourceImageIndices=[-1,-2,-3,-4],\nendImageIndices=[-2,-3,-4,-5], frameRole=\"both\", prompts length 4, then stitch_video.\nDo NOT set endImageIndex=-1 in generated-keyframe patterns โ€” that means every clip ends\non the primary uploaded image.\n\nUPLOADED FIRST-FRAME/LAST-FRAME TRANSITION CHAINS: If the user uploads multiple images\nand asks for a video that transitions from image to image, changes country/version every\nN seconds, or says to use first-frame/last-frame for each pair, call animate_photo directly.\nDo not call edit_image, generate_image, analyze_image, or map_assets_for_model first โ€” the\nuploaded images are already the keyframes. For N uploaded images, create N-1 adjacent clips\nunless the user explicitly asks for a loop back to the first image. Use per-clip duration\nfrom \"every N seconds\" when present; otherwise divide the requested total by the number of\nadjacent clips. After animate_photo returns the batch videos, always call stitch_video with\nthose video indices before finalizing.",
2164
2164
  "parameterDocs": {
2165
- "sourceImageIndices": "Batch source image indices. Read startIndex from prior generate_image/edit_image result. Negative = uploaded images (-1 = first upload).",
2165
+ "sourceImageIndices": "Batch source image indices. Read startIndex from prior generate_image/edit_image result. Negative = uploaded images (-1 = first upload). May be paired with frameRole=\"end\" only for explicit last/end-frame-only fan-out.",
2166
2166
  "prompts": "Per-clip prompt array. Length MUST equal sourceImageIndices.length when both are set.",
2167
2167
  "duration": "Per-clip duration in seconds. Target 15s when dialogue is involved and total length is given without per-clip spec.",
2168
- "frameRole": "Set to \"both\" for first+last frame transitions using sourceImageIndices + endImageIndices.",
2168
+ "frameRole": "Set to \"end\" for explicit last/end-frame-only fan-out; set to \"both\" for first+last frame transitions using sourceImageIndices + endImageIndices.",
2169
2169
  "endImageIndices": "End frames for adjacent-chain transitions. N images โ†’ N-1 clips."
2170
2170
  }
2171
2171
  },
@@ -2184,7 +2184,7 @@ const PROMPT_CONTRACTS = [
2184
2184
  "contractId": "generate_video_v1",
2185
2185
  "version": "1.1.0",
2186
2186
  "toolName": "generate_video",
2187
- "baseDescription": "generate_video produces text-to-video clips and Seedance multimodal reference videos.\nUse for text-only video generation with no source image input. For Seedance, also use this\ntool when uploaded/generated images, videos, or audio are loose references. Use animate_photo\nonly when a non-Seedance source image must become the first frame of an LTX/WAN animation.\n\nSEEDANCE UPLOADED STORYBOARD DEFAULT: When the user uploads a storyboard, shot sheet,\nmood board, or trailer concept image and asks to make a movie trailer/video/clip from it,\ndefault to one Seedance generate_video call with referenceImageIndices=[-1]. Do not first\nextract panels with edit_image, do not generate replacement keyframes, and do not make four\nseparate LTX animate_photo clips unless the user explicitly asks for separate clips or LTX.\nUse seedance2 when premium Spark access is available; if premium access is unavailable,\nexplain the limitation or use the best non-Seedance fallback the user accepts.\n\nSTORYTELLING / COMMERCIAL / TRAILER PROMPTS: For creative video requests, turn the brief\ninto timed, causally connected visual beats before writing the final prompt. Default social\nvideo is 15s 9:16 with a strong first 1-2s, visible escalation, payoff, and brand/CTA/final\nimage. Commercials should show audience desire/problem, transformation, proof/benefit, and\nCTA. Trailers should follow hook โ†’ world โ†’ disruption โ†’ escalation โ†’ reveal โ†’ title/CTA.\nEvery beat must be generatable: subject, setting, action, camera, lighting, audio, and text\nrole where relevant. Avoid vague \"cinematic\" filler, feature dumps, and beautiful images with\nno visible change.\n\nVIDEO PROMPT QUOTING: ONLY use double quotes for spoken dialogue in video prompts. Never\nquote on-screen text, titles, captions, or visual text elements โ€” describe them without\nquotes. Quotes signal speech to the model and confuse audio generation.\n\nSTORYBOARD TEXT: Structural headings, section numbers, slide titles, panel titles, and\ncaptions in storyboard references may become short audio-only narration/VO or\nkey-message beats, but they are not subtitles, title cards, lower thirds, or visible\noverlays unless the user explicitly asks for visible text, on-screen text, a title\ncard, subtitle, lower third, signage, or CTA. Keep narration as separate brief phrases\nwith pauses; do not concatenate storyboard labels into run-on voiceover.\n\nDIALOGUE DURATION: Spoken dialogue must fit the clip. Estimate 2.5 words per second\nnatural delivery plus ~1s per acting beat. Hard maximum 3.75 words/second.\nCheck: dialogue words รท 2.5 + beats โ‰ค duration. Do not submit oversized dialogue.\n\nLATEST USER DURATION WINS: In follow-up turns, use the newest duration the user states,\neven if a previous assistant message mentioned a longer script/runtime. For example, if\nhistory says \"the full script is 66 seconds\" but the user now says \"do a 30 second version\",\ngenerate the 30 second version. Do not ask a clarification question just because history\ncontains another duration; treat the latest user request as the override.\n\nSEEDANCE SHORT-DURATION LIMIT: Seedance supports 4-15s clips. If the user explicitly asks\nfor Seedance below 4s, do not silently round up. Ask whether they prefer a 4s Seedance clip\nor an exact-duration LTX clip. If the user did not explicitly ask for Seedance, choose the\nmodel/tool that can satisfy the requested duration exactly.",
2187
+ "baseDescription": "generate_video produces text-to-video clips and Seedance multimodal reference videos.\nUse for text-only video generation with no source image input. For Seedance, also use this\ntool when uploaded/generated images, videos, or audio are loose references. Use animate_photo\nonly when a non-Seedance source image must become the first frame of an LTX/WAN animation.\n\nSEEDANCE UPLOADED STORYBOARD DEFAULT: When the user uploads a storyboard, shot sheet,\nmood board, or trailer concept image and asks to make a movie trailer/video/clip from it,\ndefault to one Seedance generate_video call with referenceImageIndices=[-1]. Do not first\nextract panels with edit_image, do not generate replacement keyframes, and do not make four\nseparate LTX animate_photo clips unless the user explicitly asks for separate clips or LTX.\nUse seedance2 when premium Spark access is available; if premium access is unavailable,\nexplain the limitation or use the best non-Seedance fallback the user accepts.\n\nEXACT / INCLUDED VIDEO PROMPTS: If the user asks for a Seedance video using uploaded or\ngenerated references and says to use a prompt exactly, pass only that literal quoted prompt\nto generate_video and set skipPromptProcessing=true plus expandPrompt=false. Do not treat\nwords inside the literal prompt, such as storyboard, script, thumbnails, or panels, as a\nrequest to create a storyboard image. If the user includes a timecoded script inside a\nvideo request, keep it in the generate_video prompt. Explicit constraints like no storyboard\npanels, no subtitles, or no captions are constraints on the video render, not instructions\nto call edit_image or generate_image.\n\nSTORYTELLING / COMMERCIAL / TRAILER PROMPTS: For creative video requests, turn the brief\ninto timed, causally connected visual beats before writing the final prompt. Default social\nvideo is 15s 9:16 with a strong first 1-2s, visible escalation, payoff, and brand/CTA/final\nimage. Commercials should show audience desire/problem, transformation, proof/benefit, and\nCTA. Trailers should follow hook โ†’ world โ†’ disruption โ†’ escalation โ†’ reveal โ†’ title/CTA.\nEvery beat must be generatable: subject, setting, action, camera, lighting, audio, and text\nrole where relevant. Avoid vague \"cinematic\" filler, feature dumps, and beautiful images with\nno visible change.\n\nVIDEO PROMPT QUOTING: ONLY use double quotes for spoken dialogue in video prompts. Never\nquote on-screen text, titles, captions, or visual text elements โ€” describe them without\nquotes. Quotes signal speech to the model and confuse audio generation.\n\nSTORYBOARD TEXT: Structural headings, section numbers, slide titles, panel titles, and\ncaptions in storyboard references may become short audio-only narration/VO or\nkey-message beats, but they are not subtitles, title cards, lower thirds, or visible\noverlays unless the user explicitly asks for visible text, on-screen text, a title\ncard, subtitle, lower third, signage, or CTA. Keep narration as separate brief phrases\nwith pauses; do not concatenate storyboard labels into run-on voiceover.\n\nDIALOGUE DURATION: Spoken dialogue must fit the clip. Estimate 2.5 words per second\nnatural delivery plus ~1s per acting beat. Hard maximum 3.75 words/second.\nCheck: dialogue words รท 2.5 + beats โ‰ค duration. Do not submit oversized dialogue.\n\nLATEST USER DURATION WINS: In follow-up turns, use the newest duration the user states,\neven if a previous assistant message mentioned a longer script/runtime. For example, if\nhistory says \"the full script is 66 seconds\" but the user now says \"do a 30 second version\",\ngenerate the 30 second version. Do not ask a clarification question just because history\ncontains another duration; treat the latest user request as the override.\n\nSEEDANCE SHORT-DURATION LIMIT: Seedance supports 4-15s clips. If the user explicitly asks\nfor Seedance below 4s, do not silently round up. Ask whether they prefer a 4s Seedance clip\nor an exact-duration LTX clip. If the user did not explicitly ask for Seedance, choose the\nmodel/tool that can satisfy the requested duration exactly.",
2188
2188
  "parameterDocs": {
2189
2189
  "prompt": "Video prompt. Use double quotes ONLY for spoken dialogue. Describe visual text without quotes.",
2190
2190
  "duration": "Clip duration in seconds. Plan dialogue word count against the 3.75 words/second ceiling."
@@ -2216,7 +2216,7 @@ const PROMPT_CONTRACTS = [
2216
2216
  "contractId": "video_to_video_v1",
2217
2217
  "version": "1.0.0",
2218
2218
  "toolName": "video_to_video",
2219
- "baseDescription": "video_to_video transforms an uploaded video. Use for uploaded-video restyling, enhancement,\nupscaling/remastering, motion transfer from video to image, subject replacement, edge/pose/\ndepth-guided restyle, or explicit Seedance V2V transforms.\n\nThis tool requires an uploaded video source. Do not use it for generated video indices. For\ngenerated or uploaded partial edits use replace_video_segment; for appended time use\nextend_video; for logos/text overlays use overlay_video; for stitching use stitch_video.\n\nChoose controlMode by intent. Use detailer for quality-only enhancement without restyling.\nUse seedance-v2v only when the user asks to transform/enhance/remaster an uploaded video\nwith Seedance. For detailer, describe the original scene plus quality terms, not new content.",
2219
+ "baseDescription": "video_to_video transforms an uploaded video. Use for uploaded-video restyling, enhancement,\nupscaling/remastering, motion transfer from video to image, subject replacement, edge/pose/\ndepth-guided restyle, or explicit Seedance V2V transforms.\n\nThis tool requires an uploaded video source. Do not use it for generated video indices. For\ngenerated or uploaded partial edits use replace_video_segment; for appended time use\nextend_video; for logos/text overlays use overlay_video; for stitching use stitch_video.\n\nChoose controlMode by intent. Use detailer for quality-only enhancement without restyling.\nUse seedance-v2v only when the user asks to transform/enhance/remaster an uploaded video\nwith Seedance, including Seedance-fast uploaded-video upscale/remaster requests. For detailer,\ndescribe the original scene plus quality terms, not new content.",
2220
2220
  "parameterDocs": {
2221
2221
  "prompt": "Describe the target appearance in present tense. For detailer, describe the original content plus quality qualifiers only.",
2222
2222
  "videoSourceIndex": "Uploaded video index. Omit when there is one uploaded video; use 0 for first uploaded video or -1 if using negative upload notation.",
@@ -2240,7 +2240,7 @@ const PROMPT_CONTRACTS = [
2240
2240
  "contractId": "replace_video_segment_v1",
2241
2241
  "version": "1.0.0",
2242
2242
  "toolName": "replace_video_segment",
2243
- "baseDescription": "Use replace_video_segment when the user wants to regenerate a specific time range of an\nexisting video: \"regenerate from Xs to Ys\", \"redo the last N seconds\", \"swap out the middle\",\n\"fix the [start/middle/end] of the video\", or \"replace the [bumper/intro/outro/end card/\ntag/sting] at the [start/end] of the video\". Use explicit startSeconds and endSeconds; use\n-1 sentinels when exact base duration is unknown โ€” the handler probes and resolves.\n\nWhen the replacement is already another uploaded or generated video clip, still use\nreplace_video_segment but pass replacementVideoIndex. Example: \"splice video 2 into video 1\nat 5s\" means videoIndex=-1, replacementVideoIndex=-2, startSeconds=5, endSeconds=5.\nUse endSeconds=startSeconds for insertion; use a wider endSeconds only when the user says to\nreplace/remove that base-video range. Do not use stitch_video for \"into the middle\"/\"insert\"\nrequests, because stitch_video only concatenates full clips end-to-end.\n\nFor time-sliced interleaving from existing videos โ€” \"alternate 1s from each video\", \"weave\none-second clips from video 1 and video 2\", \"cut back and forth every N seconds\" โ€” do NOT\nuse stitch_video and do NOT omit replacementVideoIndex. Start with the first requested video\nas the base, then call replace_video_segment once for each window that should come from the\nother video. Set replacementVideoIndex to that other existing video and set\nreplacementStartSeconds/replacementEndSeconds to the next source slice from that\nreplacement video. For ordinary\nalternation, preserve the base duration: set endSeconds=startSeconds+sliceDuration, not\nendSeconds=startSeconds insertion, unless the user explicitly asks to lengthen the output by\ninserting extra slices. Skip no-op windows that already come from the base video; only splice\nwindows that should come from a different source. Example for two 10s uploads alternating every 1s starting with video\n1: replace base windows 1..2, 3..4,\n5..6, 7..8, and 9..10 with slices 0..1, 1..2, 2..3, 3..4, and 4..5 from video 2. After\neach successful splice, target the newest composite video index for the next splice.\nThe -1 time sentinel applies only to base startSeconds/endSeconds when the base duration is\nunknown. Never use -1 for replacementStartSeconds or replacementEndSeconds; source windows\nmust use concrete non-negative seconds. For uploaded/generated videos with duration metadata,\nuse that known duration directly; do not call analyze_video just to learn the clip length for\nroutine alternating slices. Do not add a final tail splice with an unknown source end โ€” stop at\nthe known clip duration or skip a no-op tail window.\n\nDo NOT call generate_video or animate_photo to re-render an existing video just to change\npart of it (the bumper, the intro, the end card, a single scene, the last few seconds, etc.).\nUse replace_video_segment โ€” it preserves the unchanged portion, keeps the original audio\noutside the replaced window, and costs far less.\n\nAuto-detects the base video's model, so OMIT videoModel unless the user explicitly demands\na different model. Short requested windows are supported by rendering with model-specific\nhandles and trimming the rendered clip before splicing, so still pass the user's exact\nstartSeconds/endSeconds.",
2243
+ "baseDescription": "Use replace_video_segment when the user wants to regenerate a specific time range of an\nexisting video: \"regenerate from Xs to Ys\", \"redo the last N seconds\", \"swap out the middle\",\n\"fix the [start/middle/end] of the video\", or \"replace the [bumper/intro/outro/end card/\ntag/sting] at the [start/end] of the video\". Use explicit startSeconds and endSeconds.\nFor relative requests like \"last 3 seconds\", resolve against the known base duration when\nduration metadata or prior tool arguments provide it. For \"bumper/end card/outro at the end\"\nwithout exact seconds, use the known storyboard timing when available; otherwise choose a\nsmall end-card window such as the final 1-3 seconds based on the base duration. If the base\nduration/window is genuinely unknown, inspect the video first or ask for the missing window;\ndo not submit ambiguous placeholder times.\n\nWhen the replacement is already another uploaded or generated video clip, still use\nreplace_video_segment but pass replacementVideoIndex. Example: \"splice video 2 into video 1\nat 5s\" means videoIndex=-1, replacementVideoIndex=-2, startSeconds=5, endSeconds=5.\nUse endSeconds=startSeconds for insertion; use a wider endSeconds only when the user says to\nreplace/remove that base-video range. Do not use stitch_video for \"into the middle\"/\"insert\"\nrequests, because stitch_video only concatenates full clips end-to-end.\n\nFor time-sliced interleaving from existing videos โ€” \"alternate 1s from each video\", \"weave\none-second clips from video 1 and video 2\", \"cut back and forth every N seconds\" โ€” do NOT\nuse stitch_video and do NOT omit replacementVideoIndex. Start with the first requested video\nas the base, then call replace_video_segment once for each window that should come from the\nother video. Set replacementVideoIndex to that other existing video and set\nreplacementStartSeconds/replacementEndSeconds to the next source slice from that\nreplacement video. For ordinary\nalternation, preserve the base duration: set endSeconds=startSeconds+sliceDuration, not\nendSeconds=startSeconds insertion, unless the user explicitly asks to lengthen the output by\ninserting extra slices. Skip no-op windows that already come from the base video; only splice\nwindows that should come from a different source. Example for two 10s uploads alternating every 1s starting with video\n1: replace base windows 1..2, 3..4,\n5..6, 7..8, and 9..10 with slices 0..1, 1..2, 2..3, 3..4, and 4..5 from video 2. After\neach successful splice, target the newest composite video index for the next splice.\nThe -1 time sentinel applies only to base startSeconds/endSeconds when the base duration is\nunknown. Never use -1 for replacementStartSeconds or replacementEndSeconds; source windows\nmust use concrete non-negative seconds. For uploaded/generated videos with duration metadata,\nuse that known duration directly; do not call analyze_video just to learn the clip length for\nroutine alternating slices. Do not add a final tail splice with an unknown source end โ€” stop at\nthe known clip duration or skip a no-op tail window.\n\nDo NOT call generate_video or animate_photo to re-render an existing video just to change\npart of it (the bumper, the intro, the end card, a single scene, the last few seconds, etc.).\nUse replace_video_segment โ€” it preserves the unchanged portion, keeps the original audio\noutside the replaced window, and costs far less.\n\nAuto-detects the base video's model, so OMIT videoModel unless the user explicitly demands\na different model. Short requested windows are supported by rendering with model-specific\nhandles and trimming the rendered clip before splicing, so still pass the user's exact\nstartSeconds/endSeconds.",
2244
2244
  "parameterDocs": {
2245
2245
  "startSeconds": "Start of segment to replace in seconds. Use -1 sentinel if exact base duration is unknown.",
2246
2246
  "endSeconds": "End of segment to replace in seconds. Use the same value as startSeconds for insertion with replacementVideoIndex.",
@@ -2433,9 +2433,9 @@ const PROMPT_CONTRACTS = [
2433
2433
  "contractId": "finalize_response_v1",
2434
2434
  "version": "1.1.0",
2435
2435
  "toolName": "finalize_response",
2436
- "baseDescription": "finalize_response marks the turn complete and stops the tool loop. Use after the requested\nworkflow succeeds, partially succeeds, fails with a surfaced error, or needs no tool action.\n\nWhen the user asked for a script, storyboard, ad concept, trailer, creator video, meme/parody,\nor music prompt and no media tool is required, deliver the final creative in a clean Markdown\ncontract: title, concept/objective, audience if relevant, timed beats or script, audio/text\nnotes, generation prompt(s), CTA, and brief assumptions. For revisions, apply the feedback\ndirectly while preserving approved elements and rejected constraints.\n\nDo not call any other tool after finalize_response. Keep the summary short and grounded in\nactual tool results; do not claim exact metadata that no tool returned.",
2436
+ "baseDescription": "finalize_response marks the turn complete and stops the tool loop. Use after the requested\nworkflow succeeds, partially succeeds, fails with a surfaced error, or needs no tool action.\n\nWhen the user asked for a script, storyboard, ad concept, trailer, creator video, meme/parody,\nor music prompt and no media tool is required, deliver the final creative in a clean Markdown\ncontract: title, concept/objective, audience if relevant, timed beats or script, audio/text\nnotes, generation prompt(s), CTA, and brief assumptions. For revisions, apply the feedback\ndirectly while preserving approved elements and rejected constraints.\n\nDo not call any other tool after finalize_response. Keep the summary short and grounded in\nactual tool results; do not claim exact metadata that no tool returned.\nFor no-action/text-only answers, such as product, feature, model, pricing, or capability\nquestions, the summary is the final answer the user sees. Provide the substantive answer\nthere; never leave it empty and never use a placeholder like \"Done.\"",
2437
2437
  "parameterDocs": {
2438
- "summary": "Short user-visible closeout. Mention produced media or the concrete blocker; avoid duplicating prior tool output.",
2438
+ "summary": "User-visible closeout. For no-action/text-only answers, include the complete substantive answer here. For media workflows, mention produced media or the concrete blocker; avoid duplicating prior tool output.",
2439
2439
  "outcome": "success, partial, asked_user, failed, or no_action based on the actual turn outcome."
2440
2440
  }
2441
2441
  },
@@ -2,7 +2,7 @@
2
2
  "id": "sogni-creative-agent-skill",
3
3
  "name": "Sogni Creative Agent Skill โ€” Image, Video & Music Generation",
4
4
  "description": "Agent skill and CLI for Sogni AI image, video, and music generation.",
5
- "version": "3.1.1",
5
+ "version": "3.3.3",
6
6
  "skills": [
7
7
  "."
8
8
  ],
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@sogni-ai/sogni-creative-agent-skill",
3
- "version": "3.3.1",
3
+ "version": "3.3.3",
4
4
  "description": "Sogni Creative Agent Skill: agent skill and CLI for Sogni AI image, video, and music generation.",
5
5
  "type": "module",
6
6
  "main": "sogni-agent.mjs",
@@ -67,7 +67,7 @@
67
67
  "sogni-agent.mjs"
68
68
  ],
69
69
  "dependencies": {
70
- "@sogni-ai/sogni-intelligence-client": "^2.4.0",
70
+ "@sogni-ai/sogni-intelligence-client": "^3.0.0-alpha.8",
71
71
  "execa": "^9.6.1",
72
72
  "json5": "^2.2.3",
73
73
  "sharp": "^0.34.5"
@@ -3,7 +3,7 @@
3
3
  "private": true,
4
4
  "type": "module",
5
5
  "dependencies": {
6
- "@sogni-ai/sogni-intelligence-client": "^2.4.0",
6
+ "@sogni-ai/sogni-intelligence-client": "^3.0.0-alpha.8",
7
7
  "execa": "^9.6.1",
8
8
  "json5": "^2.2.3",
9
9
  "sharp": "^0.34.5"
package/sogni-agent.mjs CHANGED
@@ -721,6 +721,10 @@ function buildBalanceError(message, details) {
721
721
  return err;
722
722
  }
723
723
 
724
+ function isStructuredInsufficientBalanceError(error) {
725
+ return Boolean(error && typeof error === 'object' && error.code === 'INSUFFICIENT_BALANCE');
726
+ }
727
+
724
728
  function gcdInt(a, b) {
725
729
  let x = Math.abs(Math.trunc(a));
726
730
  let y = Math.abs(Math.trunc(b));
@@ -4054,8 +4058,7 @@ function buildSkillDynamicSystemPrompt() {
4054
4058
  }
4055
4059
  suffix += `\nUser's people: ${personaContext}.`;
4056
4060
  suffix += '\n\nPERSONA RULES:'
4057
- + '\n- "me"/"I"/"myself" = the person marked (self) โ€” match by relationship when the user uses self-referencing pronouns.'
4058
- + '\n- Match personas by explicit name, self-referencing pronouns, OR relationship phrases ("my wife", "my son", "my dog", etc.).'
4061
+ + '\n- Match personas only by explicit listed name or tag/alias. Do not infer persona identity from relationship phrases alone.'
4059
4062
  + '\n- When creating images of personas, prefer image-editing with the persona\'s reference photo over generating from scratch.'
4060
4063
  + '\n- If the user mentions someone not listed, suggest adding them via `--persona-add`.';
4061
4064
  }
@@ -4129,6 +4132,45 @@ function apiChatTemplateKwargs() {
4129
4132
  return { enable_thinking: options.apiThinking };
4130
4133
  }
4131
4134
 
4135
+ function chatRunEventPayload(event) {
4136
+ if (!event || typeof event !== 'object') return event;
4137
+ return event.payload || event.data || event;
4138
+ }
4139
+
4140
+ function chatRunAssistantDelta(type, payload) {
4141
+ if (type === 'assistant_message_delta' && typeof payload?.content === 'string') {
4142
+ return payload.content;
4143
+ }
4144
+ if (
4145
+ chatRunTerminalStatus(type, payload)
4146
+ || chatRunFailureStatus(type)
4147
+ || chatRunWaitingStatus(type)
4148
+ || type === 'tool_call_progress'
4149
+ ) {
4150
+ return null;
4151
+ }
4152
+ return payload?.delta?.content
4153
+ || payload?.choices?.[0]?.delta?.content
4154
+ || (typeof payload?.content === 'string' ? payload.content : null);
4155
+ }
4156
+
4157
+ function chatRunTerminalStatus(type, payload) {
4158
+ if (type === 'run_completed' || type === 'run.completed' || type === 'completed' || type === 'done') {
4159
+ return payload?.status || 'completed';
4160
+ }
4161
+ if (type === 'run_partial_failure') return payload?.status || 'partial_failure';
4162
+ if (type === 'run_cancelled' || type === 'cancelled') return payload?.status || 'cancelled';
4163
+ return null;
4164
+ }
4165
+
4166
+ function chatRunFailureStatus(type) {
4167
+ return type === 'run_failed' || type === 'run.failed' || type === 'failed' || type === 'error';
4168
+ }
4169
+
4170
+ function chatRunWaitingStatus(type) {
4171
+ return type === 'run_waiting_for_user' || type === 'waiting_for_user';
4172
+ }
4173
+
4132
4174
  async function runApiChat(log) {
4133
4175
  const creds = loadCredentials();
4134
4176
  const apiKey = requireApiKeyCredentials(creds, '--api-chat');
@@ -4293,12 +4335,9 @@ async function runApiChatDurable(log, { apiKey, body }) {
4293
4335
 
4294
4336
  for await (const event of helpers.sdkChatRunsStreamEvents(client, runId, {})) {
4295
4337
  const type = event?.type || event?.event || '';
4296
- const payload = event?.data || event;
4338
+ const payload = chatRunEventPayload(event);
4297
4339
  // Stream assistant message deltas as they arrive.
4298
- const delta =
4299
- payload?.delta?.content
4300
- || payload?.choices?.[0]?.delta?.content
4301
- || (typeof payload?.content === 'string' ? payload.content : null);
4340
+ const delta = chatRunAssistantDelta(type, payload);
4302
4341
  if (typeof delta === 'string' && delta) {
4303
4342
  assistantParts.push(delta);
4304
4343
  if (!options.json) {
@@ -4364,16 +4403,25 @@ async function runApiChatDurable(log, { apiKey, body }) {
4364
4403
  if (Array.isArray(eventWorkflows) && eventWorkflows.length > 0) {
4365
4404
  workflows.push(...eventWorkflows);
4366
4405
  }
4367
- if (type === 'run.completed' || type === 'completed' || type === 'done') {
4368
- finalStatus = payload?.status || 'completed';
4406
+ const terminalStatus = chatRunTerminalStatus(type, payload);
4407
+ if (terminalStatus) {
4408
+ finalStatus = terminalStatus;
4369
4409
  break;
4370
4410
  }
4371
- if (type === 'run.failed' || type === 'failed' || type === 'error') {
4411
+ if (chatRunFailureStatus(type)) {
4372
4412
  const error = new Error(payload?.error?.message || 'Durable chat run failed.');
4373
4413
  error.code = payload?.error?.code || 'DURABLE_CHAT_RUN_FAILED';
4374
4414
  error.details = { runId, payload };
4375
4415
  throw error;
4376
4416
  }
4417
+ if (chatRunWaitingStatus(type)) {
4418
+ finalStatus = payload?.status || 'waiting_for_user';
4419
+ if (!options.json) {
4420
+ const reason = payload?.reason || payload?.waiting?.reason || 'user input required';
4421
+ log(`Durable chat run is waiting for user input: ${reason}`);
4422
+ }
4423
+ break;
4424
+ }
4377
4425
  }
4378
4426
  },
4379
4427
  );
@@ -5363,20 +5411,11 @@ function resolvePersonaByName(name) {
5363
5411
  // Match by name (case-insensitive)
5364
5412
  let match = personas.find(p => p.name.toLowerCase() === name.toLowerCase());
5365
5413
  if (match) return match;
5414
+ // Match by stable id
5415
+ match = personas.find(p => typeof p.id === 'string' && p.id.toLowerCase() === name.toLowerCase());
5416
+ if (match) return match;
5366
5417
  // Match by tag
5367
5418
  match = personas.find(p => p.tags?.some(t => t.toLowerCase() === name.toLowerCase()));
5368
- if (match) return match;
5369
- // Match implicit pronouns
5370
- const lower = name.toLowerCase();
5371
- if (lower === 'me' || lower === 'myself' || lower === 'i') {
5372
- match = personas.find(p => p.relationship === 'self');
5373
- } else if (lower.includes('wife') || lower.includes('husband') || lower.includes('partner')) {
5374
- match = personas.find(p => p.relationship === 'partner');
5375
- } else if (lower.includes('son') || lower.includes('daughter') || lower.includes('kid') || lower.includes('child')) {
5376
- match = personas.find(p => p.relationship === 'child');
5377
- } else if (lower.includes('dog') || lower.includes('cat') || lower.includes('pet')) {
5378
- match = personas.find(p => p.relationship === 'pet');
5379
- }
5380
5419
  return match || null;
5381
5420
  }
5382
5421
 
@@ -7963,7 +8002,7 @@ async function main() {
7963
8002
 
7964
8003
  } catch (error) {
7965
8004
  // Token auto-fallback: if using auto mode and got insufficient balance, retry with the other token
7966
- const isBalanceError = error.code === 'INSUFFICIENT_BALANCE' || /insufficient/i.test(error.message);
8005
+ const isBalanceError = isStructuredInsufficientBalanceError(error);
7967
8006
  if (_allowAutoTokenFallback && isBalanceError && options.tokenType === 'spark') {
7968
8007
  log('Insufficient SPARK balance โ€” retrying with SOGNI tokens...');
7969
8008
  options.tokenType = 'sogni';
package/version.mjs CHANGED
@@ -1 +1 @@
1
- export const PACKAGE_VERSION = '3.3.1';
1
+ export const PACKAGE_VERSION = '3.3.3';