npm - @optima-chat/gen-cli - Versions diffs - 2.6.0 → 2.6.1 - Mend

@optima-chat/gen-cli 2.6.0 → 2.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

package/.claude/skills/motion-control/SKILL.md DELETED Viewed

@@ -1,68 +0,0 @@
----
-name: motion-control
-description: "用一段参考视频的动作驱动一张人物图，输出该人物按相同动作运动的视频（kling motion-control）。触发场景：用户给一张人物图 + 一段动作参考视频，说'让这个人按这个动作动起来'/'用这个视频的动作做我这张图的视频'/'motion transfer'/'动作克隆'/'让 X 跳这段舞'。范围：画面里有真人/角色按参考动作做身体运动；不是讲话（讲话走 digital-human）、不是产品演示（走 video-gen）。"
-version: 1.0.0
-owner_repo: Optima-Chat/optima-gen
----
-# Motion Control Skill
-把「一张人物图 + 一段动作参考视频」变成「人物图里的角色按参考动作动起来」的视频。kling-2.6/motion-control 驱动，720p，$0.10/秒。
-> **CLI 版本要求**：`@optima-chat/gen-cli ≥ 2.2.0`（含 `gen video motion-control` 子命令）。报 `unknown command` → 引导用户升级，**不**自己拼接 HTTP 调用。
-> **用户向输出原则**：status / 错误 / 总结里**不出现上游品牌名**，统一用「动作驱动视频生成中」「视频生成完成」等中性描述。
-## 不是这个 skill 的场景（先路由出去）
-| 用户意图 | 走哪里 |
-|---|---|
-| 真人/数字人**讲话** | `digital-human` |
-| **产品**视频（PDP / 社媒投放 / 复刻爆款） | `video-gen` |
-| 纯文生 / 图生**非人物**视频（风景、动物） | `gen video`（基础视频） |
-| 翻译已有口播视频 | `video-translate` |
-| 剪辑已有视频 | `video-edit` |
-motion-control **只管**：有具体人物图 + 想让 ta 按某段动作运动。
-## 输入要求（不满足先问）
-- **人物图**：jpeg/png，≥300px，长宽比 2:5 到 5:2（即不能比 5:2 更窄或更宽）。**单一清晰主角**——背景人物 / 多人会被服务端拒。
-- **参考视频**：mp4/mov/mkv，3-30 秒，≤100MB，**镜头里要有完整可识别的上半身**——服务端会做人体检测，没识别到上半身会以 `No complete upper body detected` 报错。
-- **缺其中之一** → 问用户补齐，不要硬上。
-## 工作流（极简）
-1. **确认两个输入都有**（本地路径或 http(s) URL 都行）
-2. **告知预估成本**：
-   - 本地视频：`ffprobe` 取时长 `D`
-   - URL：必须问用户视频多长（CLI 不能自动 probe URL）
-   - 成本 = `D × $0.10`，告诉用户 + 等"继续"
-3. **执行**：`gen video motion-control --image <img> --reference-video <vid> [--character-orientation image|video] [--prompt "..."]`
-   - duration 不传 → CLI 自动 ffprobe 本地视频；URL 必须显式 `--duration`
-   - `character-orientation`：默认 `video`（输出时长 = 参考视频）；用户说「保持角色形象优先」→ `image`（输出截到 ≤10s）
-4. **下载 + 展示**：CLI 自动 download 到 `./gen-output/motion_<id>.mp4`，告诉用户路径
-5. **失败处理**见下表
-## 错误处理
-| 服务端错误 | 含义 + 处理 |
-|---|---|
-| `No complete upper body detected in the video` | 参考视频里没识别到完整上半身。让用户换段更清晰的动作视频，或截出有上半身的片段 |
-| `file format not support` | 极少触发（已修，走 kie File Upload API）。再现 → 报 bug，不要自己重试 |
-| `duration out of range` | duration 不在 3-30s（或 image-orientation 时超 10s）。截短参考视频或换 orientation 重试 |
-| `INSUFFICIENT_CREDITS` | 用户积分不足，告知充值 |
-| 任务超时 / 网络故障 | 用 `gen task get <id>` 查最新状态；服务端真正完成但 CLI poll 超时是正常的，结果还在 |
-## 不要做的事
-- **不要自己跑 ffmpeg** 拼/剪/转码 motion-control 的输出——这个 skill 只管「生成那条 motion-control 视频」，后期编辑走 `video-edit`
-- **不要给"改第 N 段"做支持**——motion-control 是单次原子生成，没有 storyboard / 多段概念。用户要改 → 重新生成一发
-- **不要传 `--mode`**——服务端锁 720p，传了 CLI 会过但没意义
-- **不要给商家用作产品视频**——product 视频走 `video-gen`，motion-control 不适合「让产品旋转一圈」这种纯物体场景（参考视频必须有人）
-## 相关工具
-- `gen video motion-control` — 本 skill 唯一执行命令
-- `gen task get <id>` — 查任务状态（轮询超时后用）
-- `ffprobe` — 取本地视频时长，CLI 内部已用，skill 一般不直接调

package/.claude/skills/multigrid-poster/SKILL.md DELETED Viewed

@@ -1,194 +0,0 @@
----
-name: multigrid-poster
-description: "为商家生成小红书 2×2 四宫格 / 3×3 九宫格封面图。触发场景:做小红书封面 / 小红书首图 / 种草帖封面 / 爆款封面 / 四宫格 / 九宫格。一句话指令产出 1242×1660 成片,支持自然语言迭代(换版式 / 重抽某格 / 改文案)。本 skill 只生成封面图,搜索小红书笔记 / 分析博主请用 'xhs' skill。"
-version: 1.0.0
-owner_repo: Optima-Chat/optima-gen
----
-# 小红书多宫格封面生成
-帮电商商家用 AI 图 + 通用网格布局合成小红书封面。**一句话从意图到 1242×1660 成片**,支持 4 宫格 / 9 宫格两种版式。
-## Global Rules
-优先级高于任何 pipeline 步骤。
-1. **User-facing 不出现模型名 / 服务名**
-   status / 成本 / 进度统一用"封面生成中 / 素材生成中 / 合成封面中"。`gen image` 作为 CLI 字面值可以,但不要把整条命令原文回显给用户。
-2. **花钱前必走 COST-GATE**
-   任何 `gen image` 批量调用之前必走一次成本确认。**Fast-path、迭代、重试均无例外**。2×2 = 4 张,3×3 = 9 张,SKU 拉图 = 0 张。rate 按 `gen image` 每次 1 积分估。
-3. **Per-post init 是任何 pipeline 第一步**
-   出图 / 合成执行前先建目录 + cd。否则 write 写错位置。迭代场景用 `{旧id}-vN`。
-4. **Anti-fabrication**
-   未在本 skill 显式列出的命令 / flag / 参数,不允许凭印象拼。`gen image` / `commerce` / `compose.py` 子命令同样适用。
-5. **不自动发帖**
-   只产 PNG,**绝对不调用**任何自动登录 / 发帖 / 上传命令。完成后给路径,用户自己下载手动发。
-## 工作目录
-```
-~/multigrid-poster/
-├── preferences.md
-├── history.md
-└── posters/{post-id}/
-    ├── intent.md             # 用户意图 + layout + 文案
-    ├── cells/cell_0..N.png   # 4 或 9 张素材
-    ├── cover.png             # 成片
-    └── cost.md
-```
-整个 `~/multigrid-poster/` 是一个 git repo。每步完成后 `git add -A && git commit`。
-## 启动流程
-1. **首次** (`ls ~/multigrid-poster/preferences.md` 不存在) → `mkdir -p ~/multigrid-poster/posters && cd ~/multigrid-poster && git init -b main`,创建空的 `preferences.md` (字段:merchant_id / brand_name / category / xhs_account_id / default_layout) 和 `history.md` (表头:date | post-id | layout | title | parent | status)。`.gitignore` 加 `posters/**/cells/*.png` 和 `posters/**/cover.png`。
-2. **扫未完成**:`posters/` 下有 `intent.md` 但无 `cover.png` 且 < 7 天 → 提一次"你有 N 个封面没完成"。
-3. **读 preferences.md / history.md**,继续主流程。
-## 主流程
-### Step 1: 选 layout
-| 用户原话 | layout | cells |
-|---|---|---|
-| 含"九宫格 / 9 格 / 9 张 / 清单 / N 款 / 礼物推荐 / 榜单" | **3×3** | 9 |
-| 其他(包含"四宫格 / 4 格 / 4 张" 或没指定) | **2×2** | 4 |
-### Step 2: Per-post init
-```bash
-# slug = 用户意图前 20 字内的 kebab-case
-POST_ID="$(date +%Y%m%d-%H%M)-<slug>"
-mkdir -p ~/multigrid-poster/posters/$POST_ID/cells
-cd ~/multigrid-poster/posters/$POST_ID
-```
-迭代("换版式 / 重抽 / 改文案")时:轻迭代沿用旧目录,重迭代新建 `{旧id}-vN`。
-### Step 3: 写文案
-agent 自己写,不调外部生成器。约束:
-- **title**:2 行 × 8-12 字 / 行(2×2 适合)或 2 行 × 6-10 字 / 行(3×3 标题挤)
-- **caption**:2 行 × 15-20 字 / 行
-- 硬禁:医疗 / 保健 / 绝对化用语(最 / 第一 / 唯一 / 100%)
-写到 `intent.md`:
-```markdown
-# Intent
-| 项目 | 值 |
-|---|---|
-| 用户原话 | <原话> |
-| layout | 2x2 / 3x3 |
-| title 行 1 | <8-12 字> |
-| title 行 2 | <8-12 字> |
-| caption 行 1 | <15-20 字> |
-| caption 行 2 | <15-20 字> |
-```
-展示给用户:"标题'XXX / YYY',副标题'AAA / BBB'。OK 吗?"
-- Fast-path(意图明确):告知,用户喊停才停
-- 意图模糊:必须等确认
-### Step 4: COST-GATE
-**生成前必做**(包括 Fast-path / 迭代 / 重试):
-> 即将生成封面(布局: 2x2 / 3x3),预计:
-> - 素材调用: N 次(2×2=4 / 3×3=9 / SKU 拉图=0)
-> - 预估耗时: ~X 分钟
-> - 预估成本: ~Y 积分
->
-> 继续?
-用户说"继续 / 好" → 执行。"太贵 / 换便宜的" → 提议降级 (3×3 → 2×2,或 SKU 拉图)。不回应 → 等。
-### Step 5: 出图
-**默认走 AI 生图**。每个 cell 并行调一次:
-```bash
-# 2×2: cell 尺寸 621×830;3×3: cell 尺寸 414×420
-gen image "<subprompt>" -W <W> -H <H> -o ./cells/cell_<i>.png -s <seed> -f png
-```
-`<subprompt>` 由 agent 根据用户意图为每格独立设计(不同视角 / 不同 step / 不同场景 / 不同 SKU 等)。`<seed>` 用确定性 hash(POST_ID + cell_index),迭代复用同格 seed。
-**SKU 拉图模式**(用户明确说"用我店里商品图"做 listicle):
-```bash
-commerce product list --limit 9
-```
-下载到 `./cells/cell_0..8.png`。商品 < 9 → 降级 2×2 取前 4。商品 < 4 → 报错。
-**失败容忍**:单格生图失败 → 重试 1 次(换 seed)。两次失败 → 用 `${CLAUDE_SKILL_DIR}/scripts/placeholder.png` 占位,告知用户"第 N 格失败,先占位,要重抽直接说"。
-### Step 6: 合成
-```bash
-python3 $CLAUDE_SKILL_DIR/scripts/compose.py \
-  --layout $CLAUDE_SKILL_DIR/layouts/2x2.json \
-  --cells ./cells/cell_0.png ./cells/cell_1.png ... \
-  --title-line "<title 行 1>" \
-  --title-line "<title 行 2>" \
-  --caption-line "<caption 行 1>" \
-  --caption-line "<caption 行 2>" \
-  --output ./cover.png
-```
-依赖:Pillow(容器自带)。失败常见原因:
-| 错误 | 处理 |
-|---|---|
-| `cell 数量不对` | layout 要求 4 / 9,检查 `--cells` 参数 |
-| `font not found` | 检查 `$CLAUDE_SKILL_DIR/shared/fonts/` 完整 |
-| 中文显示方块 | 同上,字体没加载 |
-### Step 7: 交付
-写 `cost.md`,追加 `~/multigrid-poster/history.md` 一行,告知用户:
-> 封面在 `~/multigrid-poster/posters/<POST_ID>/cover.png`,可以下载发帖了。
-> 换版式 / 改文案 / 重抽某格直接告诉我。
-用户说"好 / 完美" → preferences.md `Learned` 追加一条 → commit。
-## 迭代
-| 类型 | 重跑步骤 | 新目录 | 成本 |
-|---|---|---|---|
-| 换 layout(2×2 ↔ 3×3) | 文案 → 出图 → 合成 | 是(`-vN`) | 全成本 |
-| 重抽全部 | 出图 → 合成 | 是 | 全素材 |
-| 重抽单格 N | 出图(单格) → 合成 | 否 | 1 素材 |
-| 改文案 | 合成 | 否 | 0 |
-**每次迭代也走 COST-GATE**,即使 0 积分。
-## 错误处理
-| 故障 | 处理 |
-|---|---|
-| `gen image` 返回 failed | 重试 1 次换 seed → 仍失败用占位图 |
-| 超配额 / 余额不足 | 告知用户,不自动降级 |
-| `commerce product list` < 9 | 降级 2×2 取前 4 |
-| 会话关闭 | 状态在文件系统 + git,下次接续 |
-## 相关工具
-- `gen image` — 文生图(详见 `gen` skill)
-- `commerce merchant get` / `commerce product list` — 商家档案 / 商品(详见 `merchant` skill 和 `product` skill)
-- `compose.py` — 本 skill 自带的 Pillow 渲染器
-## 流程偏好
-- **信息够就直接做(Fast-path)**
-- **`intent.md` 是可追溯产物**
-- **每步完成立刻 git commit**
-- **生成过程零打扰**
-- **迭代用 `-vN` 不覆盖**
-- **新会话有未完成先告知一次**

package/.claude/skills/multigrid-poster/layouts/2x2.json DELETED Viewed

@@ -1,34 +0,0 @@
-{
-  "_comment": "通用 2×2 网格布局 — 适合 4 张 cell 的所有 intent (创业故事/对比测评/教程/场景)",
-  "canvas_size": [1242, 1660],
-  "cells": {
-    "positions": [[0, 0], [621, 0], [0, 830], [621, 830]],
-    "sizes": [[621, 830], [621, 830], [621, 830], [621, 830]]
-  },
-  "text_zones": {
-    "title": {
-      "_comment": "中央偏上 2 行标题 - 8-12 字 / 行最佳",
-      "font": "shared/fonts/MaShanZheng-Regular.ttf",
-      "size": 110,
-      "color": "#FFB940",
-      "stroke_w": 8,
-      "stroke_color": "#D63D3D",
-      "lines": [
-        {"position": [621, 480], "anchor": "mm"},
-        {"position": [621, 620], "anchor": "mm"}
-      ]
-    },
-    "caption": {
-      "_comment": "底部 2 行 caption - 15-20 字 / 行最佳",
-      "font": "shared/fonts/MaShanZheng-Regular.ttf",
-      "size": 78,
-      "color": "#FFB940",
-      "stroke_w": 6,
-      "stroke_color": "#D63D3D",
-      "lines": [
-        {"position": [621, 1340], "anchor": "mm"},
-        {"position": [621, 1450], "anchor": "mm"}
-      ]
-    }
-  }
-}

package/.claude/skills/multigrid-poster/layouts/3x3.json DELETED Viewed

@@ -1,43 +0,0 @@
-{
-  "_comment": "通用 3×3 网格布局 - 适合 9 张 cell (商品清单 / 多角度展示)",
-  "canvas_size": [1242, 1660],
-  "cells": {
-    "_comment": "9 格 414×420。顶部 200px 标题区,cells y=200..1460,底部 200px caption 区。3×420 + 200×2 = 1660 = canvas_h ✓",
-    "positions": [
-      [0, 200],   [414, 200],   [828, 200],
-      [0, 620],   [414, 620],   [828, 620],
-      [0, 1040],  [414, 1040],  [828, 1040]
-    ],
-    "sizes": [
-      [414, 420], [414, 420], [414, 420],
-      [414, 420], [414, 420], [414, 420],
-      [414, 420], [414, 420], [414, 420]
-    ]
-  },
-  "text_zones": {
-    "title": {
-      "_comment": "顶部白边 2 行标题(y < 200 区间)",
-      "font": "shared/fonts/MaShanZheng-Regular.ttf",
-      "size": 78,
-      "color": "#FFB940",
-      "stroke_w": 6,
-      "stroke_color": "#D63D3D",
-      "lines": [
-        {"position": [621, 60], "anchor": "mm"},
-        {"position": [621, 150], "anchor": "mm"}
-      ]
-    },
-    "caption": {
-      "_comment": "底部白边 2 行 caption(cells 结束于 y=1460,留 200px)",
-      "font": "shared/fonts/MaShanZheng-Regular.ttf",
-      "size": 60,
-      "color": "#FFB940",
-      "stroke_w": 5,
-      "stroke_color": "#D63D3D",
-      "lines": [
-        {"position": [621, 1530], "anchor": "mm"},
-        {"position": [621, 1610], "anchor": "mm"}
-      ]
-    }
-  }
-}

package/.claude/skills/multigrid-poster/scripts/compose.py DELETED Viewed

@@ -1,116 +0,0 @@
-#!/usr/bin/env python3
-"""multigrid-poster compose — Pillow 版渲染器
-输入：layout.json + N 张 cell 图片 + 标题文字 + caption 文字
-输出：1242×1660 PNG（小红书封面标准）
-用法：
-  python compose.py \
-    --layout <SKILL_DIR>/layouts/2x2.json \
-    --cells cell_0.png cell_1.png cell_2.png cell_3.png \
-    --title-line "26岁一个人创业" \
-    --title-line "跨境电商月入10w+" \
-    --caption-line "只需要一部手机就可以完成！" \
-    --caption-line "跨境人的必备app推荐" \
-    --output cover.png
-依赖：Pillow (容器自带，无需额外安装)
-字体：从 SKILL_DIR/shared/fonts/ 加载（layout.json 里指定）
-"""
-import argparse
-import json
-import sys
-from pathlib import Path
-from PIL import Image, ImageDraw, ImageFont
-def compose(layout_path: Path, cell_paths: list[Path],
-            title_lines: list[str], caption_lines: list[str],
-            output_path: Path) -> Path:
-    """根据 layout.json 渲染海报。
-    layout.json 字段：
-      canvas_size       [w, h]
-      cells.positions   [[x,y], ...] cell 左上角
-      cells.sizes       [[w,h], ...] cell 尺寸
-      text_zones.title  {lines: [...], size, color, stroke_w, stroke_color, font}
-      text_zones.caption {同上}
-    """
-    layout = json.loads(layout_path.read_text(encoding="utf-8"))
-    # 约定:layout 必须放在 <skill>/layouts/ 下,字体路径相对 <skill>/ 解析
-    skill_dir = layout_path.parent.parent
-    # 文本行数 vs layout 配置行数:行多了 zip 会静默截断,显式 fail-fast
-    # 错误信息用英文 — sys.exit 走 stderr,Windows GBK locale 下中文会乱码
-    for zone_key, lines in [("title", title_lines), ("caption", caption_lines)]:
-        zone = layout.get("text_zones", {}).get(zone_key)
-        if not zone:
-            continue
-        max_lines = len(zone["lines"])
-        if len(lines) > max_lines:
-            sys.exit(f"too many {zone_key} lines: got {len(lines)}, layout supports {max_lines}")
-    # Canvas
-    canvas = Image.new("RGB", tuple(layout["canvas_size"]), "white")
-    # Cells
-    cell_count = len(layout["cells"]["positions"])
-    if len(cell_paths) != cell_count:
-        sys.exit(f"cell count mismatch: layout expects {cell_count}, got {len(cell_paths)}")
-    for cp, pos, size in zip(cell_paths,
-                              layout["cells"]["positions"],
-                              layout["cells"]["sizes"]):
-        img = Image.open(cp).convert("RGB").resize(tuple(size), Image.LANCZOS)
-        canvas.paste(img, tuple(pos))
-    draw = ImageDraw.Draw(canvas)
-    # Text zones (title + caption 通用绘制)
-    for zone_key, lines in [("title", title_lines), ("caption", caption_lines)]:
-        if zone_key not in layout["text_zones"]:
-            continue
-        zone = layout["text_zones"][zone_key]
-        font_path = skill_dir / zone["font"]
-        font = ImageFont.truetype(str(font_path), zone["size"])
-        for text, line_cfg in zip(lines, zone["lines"]):
-            if not text:
-                continue
-            draw.text(
-                tuple(line_cfg["position"]), text,
-                fill=zone["color"], font=font,
-                stroke_width=zone.get("stroke_w", 0),
-                stroke_fill=zone.get("stroke_color", zone["color"]),
-                anchor=line_cfg.get("anchor", "la"),
-            )
-    canvas.save(output_path, optimize=True)
-    return output_path
-def main():
-    ap = argparse.ArgumentParser(description=__doc__,
-                                  formatter_class=argparse.RawDescriptionHelpFormatter)
-    ap.add_argument("--layout", required=True, type=Path,
-                    help="layout.json 路径（如 SKILL_DIR/layouts/2x2.json）")
-    ap.add_argument("--cells", required=True, nargs="+", type=Path,
-                    help="N 张 cell 图片路径，顺序对应 layout.cells.positions")
-    ap.add_argument("--title-line", action="append", default=[],
-                    help="标题行（可重复）")
-    ap.add_argument("--caption-line", action="append", default=[],
-                    help="底部 caption 行（可重复）")
-    ap.add_argument("--output", required=True, type=Path,
-                    help="输出 PNG 路径")
-    args = ap.parse_args()
-    out = compose(
-        layout_path=args.layout,
-        cell_paths=args.cells,
-        title_lines=args.title_line,
-        caption_lines=args.caption_line,
-        output_path=args.output,
-    )
-    print(f"saved {out} ({out.stat().st_size // 1024} KB)")
-if __name__ == "__main__":
-    main()

package/.claude/skills/multigrid-poster/scripts/placeholder.png DELETED Viewed

Binary file

package/.claude/skills/multigrid-poster/shared/fonts/MaShanZheng-Regular.ttf DELETED Viewed

Binary file

package/.claude/skills/video-compose/SKILL.md DELETED Viewed

@@ -1,144 +0,0 @@
----
-name: video-compose
-description: "把【多个素材片段 + 一段口播文案】自动合成一条可发布短视频——自动配音、按文案语义选片拼接、烧字幕、加 BGM。用户只需拍素材/用 AI 生成片段 + 写文案，剪辑配音字幕全交给 AI。
-  必备前提：用户有 ≥1 个视频片段文件 + 一段口播文案（或愿意现写）。
-  触发：用户给出多个视频片段 + 文案，并说'拼个视频'/'把这堆素材剪一下'/'给这段文案配视频'/'做条口播带画面的视频'/'素材+文案做成片'。
-  不触发：只给 1 个口播 talking-head 视频要剪（用 video-edit）；只给产品图要生成画面（用 video-gen）；用户要【保留片段原声】拼接（用 video-edit，本 skill 会丢弃原声）。"
-version: 1.0.0
-owner_repo: Optima-Chat/optima-gen
----
-# video-compose — 素材 + 文案 → 成片
-用户给一堆片段 + 一段口播稿，你交付配好音、配好画面、带字幕和 BGM 的竖版成片。
-> ⚠ **语义前提：片段被当作纯画面 b-roll，原声一律丢弃**，成片音频 = AI 配音 + BGM。若用户给的是带人声的口播片段、想**保留原声**拼接，那是 video-edit 的活——开工前跟用户确认一句"片段原声会去掉、全程用 AI 配音"，避免出片后才发现声音没了。
-**关键：选片这一步由你（Claude）亲自看帧判断**——脚本把每个 clip 抽 3 帧，你用 Read 看图，按文案语义写 `proposal.json`。不需要任何外部 vision API。
-## 工具
-`python3 $CLAUDE_SKILL_DIR/scripts/video_compose.py <frames|build> <proj-dir>`
-- 依赖（容器自带）：`python3`、`ffmpeg`、`gen` CLI。
-- 情感配音走 `gen tts --provider minimax`（key + 计费在后端 optima-generation，skill 不碰密钥）。
-## 工作目录
-```
-<proj>/inputs/clips/*.mp4    素材（任意命名，按文件名排序得 clip id）
-<proj>/inputs/script.txt     口播稿，每行一句 = 一个 segment
-<proj>/work/                 中间产物（frames/ proposal.json subs.ass 等）
-<proj>/final.mp4             成片
-```
-## Step 0：指令清单读回（≥ 2 个动作时必跑）
-用户一条消息里给多个要求（如"竖版 + 不要 BGM + 字幕大一点 + 压到 20 秒"）时，**先拆成原子清单读回、等确认再动手**，不要边读边做。单一动作（"拼个视频"）跳过。
-（理由同 video-edit：多指令直接执行易漏，漏了要等成片出来才发现，全流程重做。）
-## 主流程
-### 1. 建工程 + 落素材和文案
-- 建 `<proj>/inputs/clips/` 和 `<proj>/inputs/script.txt`
-- 把用户的片段拷进 clips/（命名随意，建议 `01.mp4 02.mp4 …` 便于引用）
-- 文案写进 script.txt，**每行一句**。用户没给文案就先帮他写（看素材定主题），写完**先给用户确认文案**再继续。
-### 2. 抽帧
-```
-python .../video_compose.py frames <proj>
-```
-产出 `work/frames/<clipid>_<tag>.jpg`（每片自适应抽 3~6 帧，约每 5s 一帧）+ `work/clips_manifest.json`。**manifest 里每帧带 `t`（秒）= 该子镜头在素材中的时间点**——写 proposal 时用它指定 `src_start`。
-### 3. 看帧 + 写 proposal.json（**你的核心判断**）
-- 用 **Read 逐张看** `work/frames/` 里的帧，在心里给每个 clip 一句话描述（人物/动作/场景/有无烧死字幕）。
-- 按 script 每句的语义，给它挑最贴合的 clip，写出 `work/proposal.json`：
-```json
-{
-  "voice": "Chinese (Mandarin)_Warm_Girl",
-  "bgm_mood": "warm",
-  "assignments": [
-    { "segment_idx": 0, "text": "第一句原文", "clip": "01", "src_start": 5.2, "emotion": "sad", "speed": 0.92, "rationale": "为什么选它 + 为什么这个子镜头 + 为什么这个情绪" }
-  ]
-}
-```
-- **voice**：voice_id。**只用 `voice-samples/CATALOG.md` 里 7 个实测可用的音色**（标签 ↔ voice_id），别填没验证过的（错的 voice_id 会 `UPSTREAM_UNKNOWN: voice id not exist` 失败）。默认 `Chinese (Mandarin)_Warm_Girl`（温暖少女）。**具体音色让用户试听后定，见 §4。**
-- **clip**：素材 id（manifest 里的 `id`）
-- **src_start**：从该素材的**哪一秒**开始切（= 你选中那个子镜头帧的 `t`，看 manifest）。**这是避免重复镜头的关键**：同一 clip 给多句复用时，每句填**不同的 `t`**（如倒水特写 t=11.5、碰杯 t=20.7），脚本以该时刻为中心切，画面就不会重复。不填则脚本按复用顺序自动均匀错开（也不会重复，但选的子镜头不一定贴文案，所以建议显式填）。
-- **emotion**：每句独立，九选一 `happy/sad/angry/fearful/disgusted/surprised/calm/fluent/whisper`，按文案情绪配
-- **speed**：0.5–2.0。**这是抖音/TikTok/小红书短视频工具，默认就要快**——不写 speed 脚本按 `DEFAULT_SPEED=1.35` 配音（≈TikTok 口播节奏）；想更冲可显式写 1.5；**只有明确要治愈/抒情慢节奏才写 ≤1.0**（如 0.9）。别让成片听起来拖沓。
-- **bgm_mood / bgm**：见 §BGM（用户没指定就按文案情绪填 `bgm_mood`）
-选片规则：
-- 首句优先用能建立场景/正面的镜头
-- 时序上有视觉故事线就尊重它
-- **clip 可复用，但复用时必须给不同的 `src_start`（选不同子镜头），否则画面重复**；也尽量别连续两句用同一 clip
-- **每个 clip 通常有多个子镜头（manifest 多帧）——优先把不同子镜头分给不同句子，让素材都出镜，而不是反复用片头**
-- 素材数 < 句子数时复用并提示用户素材偏少
-- **若某 clip 帧里有烧死的字幕/水印**（见 §坑），尽量不选它承载关键句；非用不可则提示用户
-情绪配法：按文案的情感弧线给每句配 emotion，不要全句一个调。例：怅然开场 `sad` → 转折欣喜 `happy` → 高潮 `surprised` → 舒缓 `calm` → 暖心收尾 `happy`。
-节奏（平台风格）：这是短视频，默认**快节奏**（像抖音/TikTok 口播）。文案**短句、强钩子、句子别太长**（一句太长配音就拖、画面也切得慢）；语速默认 1.35，想更冲可更快。除非用户明确要慢节奏治愈片，否则别配出"念课文"那种拖沓感。
-### 4. 试听音色 / BGM + 看选片方案 → 用户拍板
-**音色和 BGM 是创作决策，让用户自己听、自己定，不要替用户默认死。** 出片前：
-> **素材路径**：BGM 库和音色样音随 `gen-cli` 一起发布（不在 skill 目录）。先解析一次：
-> ```bash
-> ASSETS=$(python3 $CLAUDE_SKILL_DIR/scripts/video_compose.py assets-dir)
-> ```
-> 下面用 `$ASSETS/voice-samples`、`$ASSETS/bgm-library/<mood>`。
-1. **音色试听**：把 `$ASSETS/voice-samples/*.mp3`（7 个音色样音，标签↔voice_id 见同目录 `CATALOG.md`）拷到 `<proj>/previews/`，把可播放的文件链接列给用户，让用户**听完选一个**。
-2. **BGM 试听**：按文案情绪先推荐一类（治愈→`warm`、种草→`upbeat`…），把该类（用户想多听就多拷几类）`$ASSETS/bgm-library/<mood>/*.mp3` 拷到 `<proj>/previews/` 给用户试听；用户可换类、指定某首，或**自己上传一首**（放 `inputs/bgm/`，优先级最高）。
-3. **选片方案**：把 assignments 摘要（每句哪个 clip + 理由）一并给用户过目。
-4. **给推荐默认**（音色=贴文案的一个、BGM=按情绪一类），用户想省事一句「就用默认」即可直接出片；想换就听了再定。
-用户拍板后：选定的 voice_id 写进 `proposal.voice`、BGM 写进 `bgm`(自带路径) 或 `bgm_mood`(情绪库)，再：
-```
-python3 $CLAUDE_SKILL_DIR/scripts/video_compose.py build <proj>
-```
-脚本做：逐句情感配音（带缓存，未改的句子不重跑）→ 每句按 `src_start` 用 `-ss` **精切对应子镜头**到该句时长（竖版 crop，clip 比该句短则**慢放填满、不 loop**；**同素材复用强制时间窗不重叠，防重复镜头**）→ 拼接 → 烧字幕 → BGM sidechain ducking（有 BGM 才加）→ `final.mp4`。
-### 5. 交付汇报
-报成片路径 + 时长 + 用了几个 clip + 配音字数。**不要提任何模型/服务名**，配音统一说"AI 配音"。
-## BGM（不锁死；用户没指定就按文案情绪自动配）
-来源优先级：
-1. `proposal["bgm"]` 显式路径（用户明确指定某首）
-2. `inputs/bgm/` 用户上传的音频
-3. `proposal["bgm_mood"]` → 从情绪库 `bgm-library/<mood>/` **随机挑一首**（用户没指定 BGM 时，**你按文案整体情绪填这个字段**）
-4. 都没有 → 仅人声
-- 有 BGM 时自动 sidechain ducking（人声起时压低 BGM）
-- **优先让用户试听后选**（见 §4 步骤 2）；用户说「你定/随便」时才按文案情绪自动填 `bgm_mood`，**绝不留空**。情绪 → 目录：`warm`(治愈/温暖) `upbeat`(欢快) `sad`(伤感) `calm`(舒缓) `energetic`(高能/卖货) `dramatic`(戏剧/科技)
-- 用户明确要某首 / 要换 → 用优先级 1、2
-- ⚠ 库里只能放**可商用授权**的曲子（公开发布视频有版权要求）；某情绪目录为空时该句跳过 BGM 并提示
-## 用户改方案怎么办
-- 换某句的画面：改 `proposal.json` 那条的 `clip`（换素材）或 `src_start`（同素材换子镜头），重跑 `build`
-- 嫌某两句画面重复：给它们填不同的 `src_start`（看 manifest 的帧 `t`），重跑 `build`
-- 换情绪/音色：改 `emotion`/`voice`，重跑 `build`
-- 换/加 BGM：丢文件进 `inputs/bgm/` 或改 `bgm` 路径，重跑 `build`
-**配音有缓存**（engine/voice/emotion/speed/text 没变就复用），所以单纯换 BGM/字幕**不重新花钱跑配音**，秒出。`proposal.json` 就是反复调的抓手。
-## 坑（实跑踩过）
-| 坑 | 处理 |
-|---|---|
-| **用户素材自带烧死字幕/水印** | 你看帧时若发现某片已有硬字幕（如原片烧了英文 caption），新字幕叠上去会重影。承载关键句时避开该片，或提示用户该片有原生字幕 |
-| **clip 是横版** | 脚本默认 center-crop 到竖版会裁掉两侧；横版素材多时提示用户成片是竖版裁切 |
-| **素材总时长 < 文案配音时长** | 不再 loop（loop=重复）：单片比某句短→慢放填满；某片被多句复用但时长不够 distinct 画面→`[mix] ⚠` 告警。**要彻底不重复**：素材总时长应 ≥ 配音总时长，且每片别被复用超过它能切出的 distinct 段数；不够就提示用户补素材或精简文案 |
-| **字幕样式/字体** | 默认白字黑边底部。字体走 env `VIDEO_COMPOSE_FONT`（默认 `Noto Sans CJK SC`）。prod shell 镜像（optima-ai-shell 根 `Dockerfile`，Ubuntu 22.04）已装 `fonts-noto-cjk`（提供 `Noto Sans CJK SC`）+ Source Han Sans SC，CJK 字体确认可用。`build` 前 `_preflight_font()` 会 `fc-match` 兜底，缺字体 fail-loud 不静默渲染豆腐块。换镜像/换 env 字体时按此 fc-match 验证 |
-| **gen tts 报错** | 透传后端错误。`PROVIDER_INSUFFICIENT_CREDITS`=MiniMax 余额；`INVALID_INPUT`=emotion/voice 非法。配音失败不出片，不静默吞 |
-## 关键参数（scripts/video_compose.py 顶部）
-- `W,H=1080,1920`（竖版）、`FPS=30`、`CRF=20`
-- 配音：`gen tts --provider minimax`，密钥/计费在后端，skill 不碰密钥
-- `VIDEO_COMPOSE_FONT`：字幕字体（容器需有对应 CJK 字体）
-- BGM 不锁死：见 §BGM；情绪库随 `gen-cli` 打包，脚本自动解析（`$ASSETS/bgm-library/<mood>/`，见 §4 素材路径）