PyPI - videoconverter-worker - Versions diffs - 1.0.0__tar.gz - Mend

videoconverter-worker 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

videoconverter_worker-1.0.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,81 @@
+Metadata-Version: 2.4
+Name: videoconverter-worker
+Version: 1.0.0
+Summary: VideoConverter Python Worker：从 queue 目录读取任务并执行切分/去字幕/合成
+License: MIT
+Keywords: videoconverter,ffmpeg,worker,video
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Requires-Python: >=3.8
+Description-Content-Type: text/plain
+Python Worker 使用说明
+0) 安装（任选其一）
+   - 从 PyPI 安装（推荐，Windows/Linux/macOS 通用）：
+     pip install videoconverter
+     安装后直接运行： videoconverter [--data-dir DIR] [--path-replace OLD=NEW]
+   - 或从本地源码安装： cd src/python && pip install .
+1) 作用
+   与 Java BackendWorker 行为一致：从 queue 目录读取任务（queue/<task_id>.json），
+   执行 SPLIT / DESUBTITLE / MERGE / ONE_CLICK_COMPOSE，更新状态与 metadata.json。
+   适合在服务器上运行，利用多核与高性能做切分、去字幕、合成。
+2) 环境
+   - Python 3.8+
+   - 系统已安装 ffmpeg、ffprobe（或在 PATH 中）
+   - 可选：环境变量 FFMPEG_PATH、FFPROBE_PATH 指定路径
+3) 本机与 Java 共用同一队列（同一台机器）
+   - 数据目录一致即可，例如: ~/.videoconverter/data
+   - 先由 Java 前端创建任务（写 queue/*.json），再在本机运行 Python worker 处理：
+     cd src/python
+     python worker.py
+   - 或指定数据目录：
+     python worker.py --data-dir /path/to/.videoconverter/data
+4) 把「前端处理过的文件夹」拷到服务器后运行
+   - 将本机用于队列的「数据目录」整个拷到服务器（例如 /server/data），
+     其中应包含 queue/ 目录及 queue/*.json 任务文件。
+   - 若任务 JSON 里的路径是本机绝对路径（如 /Users/me/videos/a.mp4），
+     需要在服务器上做路径替换，否则找不到文件：
+     python worker.py --data-dir /server/data --path-replace "/Users/me/videos=/server/videos"
+   - 环境变量方式（便于脚本/ systemd）：
+     export VIDEOCONVERTER_DATA_DIR=/server/data
+     export VIDEOCONVERTER_PATH_REPLACE="/Users/me/videos=/server/videos"
+     python worker.py
+   - 建议：在服务器上把视频放在固定目录（如 /server/videos），
+     拷过去的 data 里 queue 的 JSON 中路径统一用本机前缀，用 --path-replace 换成服务器前缀。
+5) 多进程并发
+   当前 worker 为单进程单线程循环。要跑满多核，可在同一 data-dir 下启动多个进程：
+   - 任务抢占通过 queue/<task_id>.lock 原子创建，多进程不会抢到同一任务。
+   - 示例（4 个 worker 进程）：
+     for i in 1 2 3 4; do python worker.py --data-dir /server/data & done
+     或使用 systemd/supervisor 起多个 worker 实例。
+6) 暂停/继续
+   队列暂停由 queue_config.json 的 queue_paused 控制（"true" 为暂停）。
+   Python worker 会定期读该配置，为 true 时不取新任务。可由 Java 前端或手动改该文件控制。
+7) 与 Java 的约定
+   - 任务与 config 格式见项目根目录 queue_task_schema.txt。
+   - metadata.json 与 Java 生成的格式一致，合成（MERGE）以 metadata 为准。
+8) 打包与部署
+   - 打 zip 包（拷贝到服务器解压即用）：
+     cd src/python
+     ./build_deploy.sh
+     会生成 videoconverter.zip，解压后在该目录执行：
+     python3 worker.py [--data-dir DIR] [--path-replace OLD=NEW]
+   - 或安装为命令行（本机/服务器均可）：
+     cd src/python
+     pip install .
+     然后可直接运行： videoconverter [--data-dir DIR] [--path-replace OLD=NEW]

videoconverter_worker-1.0.0/README.txt ADDED Viewed

@@ -0,0 +1,64 @@
+Python Worker 使用说明
+0) 安装（任选其一）
+   - 从 PyPI 安装（推荐，Windows/Linux/macOS 通用）：
+     pip install videoconverter
+     安装后直接运行： videoconverter [--data-dir DIR] [--path-replace OLD=NEW]
+   - 或从本地源码安装： cd src/python && pip install .
+1) 作用
+   与 Java BackendWorker 行为一致：从 queue 目录读取任务（queue/<task_id>.json），
+   执行 SPLIT / DESUBTITLE / MERGE / ONE_CLICK_COMPOSE，更新状态与 metadata.json。
+   适合在服务器上运行，利用多核与高性能做切分、去字幕、合成。
+2) 环境
+   - Python 3.8+
+   - 系统已安装 ffmpeg、ffprobe（或在 PATH 中）
+   - 可选：环境变量 FFMPEG_PATH、FFPROBE_PATH 指定路径
+3) 本机与 Java 共用同一队列（同一台机器）
+   - 数据目录一致即可，例如: ~/.videoconverter/data
+   - 先由 Java 前端创建任务（写 queue/*.json），再在本机运行 Python worker 处理：
+     cd src/python
+     python worker.py
+   - 或指定数据目录：
+     python worker.py --data-dir /path/to/.videoconverter/data
+4) 把「前端处理过的文件夹」拷到服务器后运行
+   - 将本机用于队列的「数据目录」整个拷到服务器（例如 /server/data），
+     其中应包含 queue/ 目录及 queue/*.json 任务文件。
+   - 若任务 JSON 里的路径是本机绝对路径（如 /Users/me/videos/a.mp4），
+     需要在服务器上做路径替换，否则找不到文件：
+     python worker.py --data-dir /server/data --path-replace "/Users/me/videos=/server/videos"
+   - 环境变量方式（便于脚本/ systemd）：
+     export VIDEOCONVERTER_DATA_DIR=/server/data
+     export VIDEOCONVERTER_PATH_REPLACE="/Users/me/videos=/server/videos"
+     python worker.py
+   - 建议：在服务器上把视频放在固定目录（如 /server/videos），
+     拷过去的 data 里 queue 的 JSON 中路径统一用本机前缀，用 --path-replace 换成服务器前缀。
+5) 多进程并发
+   当前 worker 为单进程单线程循环。要跑满多核，可在同一 data-dir 下启动多个进程：
+   - 任务抢占通过 queue/<task_id>.lock 原子创建，多进程不会抢到同一任务。
+   - 示例（4 个 worker 进程）：
+     for i in 1 2 3 4; do python worker.py --data-dir /server/data & done
+     或使用 systemd/supervisor 起多个 worker 实例。
+6) 暂停/继续
+   队列暂停由 queue_config.json 的 queue_paused 控制（"true" 为暂停）。
+   Python worker 会定期读该配置，为 true 时不取新任务。可由 Java 前端或手动改该文件控制。
+7) 与 Java 的约定
+   - 任务与 config 格式见项目根目录 queue_task_schema.txt。
+   - metadata.json 与 Java 生成的格式一致，合成（MERGE）以 metadata 为准。
+8) 打包与部署
+   - 打 zip 包（拷贝到服务器解压即用）：
+     cd src/python
+     ./build_deploy.sh
+     会生成 videoconverter.zip，解压后在该目录执行：
+     python3 worker.py [--data-dir DIR] [--path-replace OLD=NEW]
+   - 或安装为命令行（本机/服务器均可）：
+     cd src/python
+     pip install .
+     然后可直接运行： videoconverter [--data-dir DIR] [--path-replace OLD=NEW]

videoconverter_worker-1.0.0/ffmpeg_runner.py ADDED Viewed

@@ -0,0 +1,278 @@
+# -*- coding: utf-8 -*-
+"""
+FFmpeg 切分、去字幕、合并，与 Java ChunkSplitService / FFmpegService / ChunkMergeService 逻辑对齐。
+"""
+import logging
+import os
+import subprocess
+import tempfile
+import uuid
+from pathlib import Path
+from typing import List, Optional, Tuple
+logger = logging.getLogger(__name__)
+def _find_ffmpeg() -> str:
+    return os.environ.get("FFMPEG_PATH", "ffmpeg")
+def _find_ffprobe() -> str:
+    return os.environ.get("FFPROBE_PATH", "ffprobe")
+def _format_time(seconds: float) -> str:
+    if seconds < 0:
+        return "0"
+    if seconds < 3600:
+        m = int(seconds // 60)
+        s = seconds % 60
+        return f"{m}:{s:05.2f}" if s != int(s) else f"{m}:{int(s):02d}"
+    h = int(seconds // 3600)
+    m = int((seconds % 3600) // 60)
+    s = seconds % 60
+    return f"{h}:{m:02d}:{s:05.2f}" if s != int(s) else f"{h}:{m:02d}:{int(s):02d}"
+def get_duration(video_path: str) -> float:
+    cmd = [
+        _find_ffprobe(),
+        "-v", "error",
+        "-show_entries", "format=duration",
+        "-of", "default=noprint_wrappers=1:nokey=1",
+        video_path,
+    ]
+    r = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
+    if r.returncode != 0:
+        raise RuntimeError(f"ffprobe 失败: {r.stderr or r.stdout}")
+    line = (r.stdout or "").strip()
+    if not line:
+        return 0.0
+    return float(line)
+def split_chunk(video_path: str, start_time: float, duration: float, output_path: str) -> None:
+    cmd = [
+        _find_ffmpeg(),
+        "-ss", _format_time(start_time),
+        "-i", video_path,
+        "-t", _format_time(duration),
+        "-c", "copy",
+        "-avoid_negative_ts", "make_zero",
+        "-y", output_path,
+    ]
+    run_ffmpeg(cmd)
+def run_ffmpeg(cmd: List[str], timeout: Optional[int] = None) -> None:
+    logger.debug("FFmpeg: %s", " ".join(cmd))
+    p = subprocess.run(
+        cmd,
+        capture_output=True,
+        text=True,
+        timeout=timeout or 86400,
+    )
+    if p.returncode != 0:
+        raise RuntimeError(f"FFmpeg 退出码 {p.returncode}: {p.stderr[:2000] if p.stderr else p.stdout}")
+def build_video_filter(config: dict, original_width: int, original_height: int) -> str:
+    """与 Java buildFilterFor1080pInput / buildFilterFor4KInput 对齐。"""
+    target_width = config.get("targetWidth", 1920)
+    target_height = config.get("targetHeight", 1080)
+    crop_bottom = config.get("cropBottom", 0)
+    if target_width <= 0 or target_height <= 0:
+        target_width, target_height = 1920, 1080
+    cropped_height = original_height - crop_bottom
+    if original_width == 1920 and original_height == 1080:
+        filters = []
+        if crop_bottom > 0:
+            filters.append(f"crop={original_width}:{cropped_height}:0:0")
+            current_h = cropped_height
+        else:
+            current_h = original_height
+        current_w = original_width
+        aspect = current_w / current_h if current_h else 16 / 9
+        if abs(aspect - 16 / 9) > 0.01:
+            final_w = int(current_h * 16 / 9)
+            crop_x = (current_w - final_w) // 2
+            filters.append(f"crop={final_w}:{current_h}:{crop_x}:0")
+            current_w = final_w
+        if current_w != target_width or current_h != target_height:
+            filters.append(f"scale={target_width}:{target_height}")
+        return ",".join(filters) if filters else "null"
+    # 4K/高分辨率：裁底 -> scale -> 裁左右 -> scale
+    scaled_w = target_width
+    scaled_h = int(cropped_height * scaled_w / original_width)
+    final_crop_w = int(scaled_h * 16 / 9)
+    crop_x = (scaled_w - final_crop_w) // 2
+    return (
+        f"crop={original_width}:{cropped_height}:0:0,"
+        f"scale={scaled_w}:{scaled_h},"
+        f"crop={final_crop_w}:{scaled_h}:{crop_x}:0,"
+        f"scale={target_width}:{target_height}"
+    )
+def build_desubtitle_command(config: dict, input_path: str, output_path: str) -> List[str]:
+    """构建去字幕 FFmpeg 命令（软件编码 libx264，服务器通用）。"""
+    start_time = config.get("startTime", 0) or 0
+    end_time = config.get("endTime", 0) or 0
+    keep_audio = config.get("keepAudio", True)
+    audio_bitrate = config.get("audioBitrate", 192)
+    video_quality = config.get("videoQuality", 23)
+    original_width = config.get("originalWidth", 0) or 1920
+    original_height = config.get("originalHeight", 0) or 1080
+    force_keyframe = config.get("forceKeyframeAtStart", False)
+    cmd = [_find_ffmpeg()]
+    if start_time > 0:
+        cmd += ["-ss", _format_time(start_time)]
+    cmd += ["-i", input_path]
+    if end_time > 0 and start_time >= 0:
+        duration = end_time - start_time
+        if duration > 0:
+            cmd += ["-t", _format_time(duration)]
+    elif end_time > 0:
+        cmd += ["-t", _format_time(end_time)]
+    vf = build_video_filter(config, original_width, original_height)
+    if vf and vf != "null":
+        cmd += ["-vf", vf]
+    cmd += ["-c:v", "libx264", "-preset", "medium", "-crf", str(video_quality)]
+    if force_keyframe:
+        cmd += ["-force_key_frames", "expr:eq(n,0)"]
+    if keep_audio:
+        cmd += ["-c:a", "aac", "-b:a", f"{audio_bitrate}k", "-ac", "2"]
+    else:
+        cmd += ["-an"]
+    cmd += ["-y", output_path]
+    return cmd
+def run_desubtitle(config: dict, input_path: str, output_path: str, progress_callback=None) -> bool:
+    cmd = build_desubtitle_command(config, input_path, output_path)
+    run_ffmpeg(cmd)
+    return True
+def merge_with_concat(filelist_path: str, output_path: str) -> None:
+    cmd = [
+        _find_ffmpeg(),
+        "-f", "concat", "-safe", "0",
+        "-i", filelist_path,
+        "-c", "copy",
+        "-y", output_path,
+    ]
+    run_ffmpeg(cmd)
+def trim_video(input_path: str, start_offset: float, duration: float, output_path: str) -> None:
+    cmd = [
+        _find_ffmpeg(),
+        "-i", input_path,
+        "-ss", _format_time(start_offset),
+        "-t", _format_time(duration),
+        "-c", "copy",
+        "-y", output_path,
+    ]
+    run_ffmpeg(cmd)
+def split_video_to_chunks(
+    video_path: str,
+    output_dir: str,
+    chunk_size_sec: float,
+    range_start: float,
+    range_end: float,
+) -> Tuple[dict, str]:
+    """
+    切分视频为多个 chunk，写入 output_dir/<video_id>/，返回 (metadata_dict, video_id)。
+    """
+    path = Path(video_path)
+    if not path.exists():
+        raise FileNotFoundError(f"视频不存在: {video_path}")
+    duration = get_duration(video_path)
+    if duration <= 0:
+        raise ValueError(f"无法获取视频时长: {video_path}")
+    start_sec = max(0, range_start)
+    end_sec = min(duration, range_end) if range_end > 0 else duration
+    if start_sec >= end_sec:
+        raise ValueError("开始时间必须小于结束时间")
+    effective = end_sec - start_sec
+    video_id = path.stem + "_" + uuid.uuid4().hex[:8]
+    chunk_dir = Path(output_dir) / video_id
+    original_dir = chunk_dir / "original"
+    original_dir.mkdir(parents=True, exist_ok=True)
+    total_chunks = int(__import__("math").ceil(effective / chunk_size_sec))
+    chunks = []
+    for i in range(total_chunks):
+        ch_start = start_sec + i * chunk_size_sec
+        ch_end = min(start_sec + (i + 1) * chunk_size_sec, end_sec)
+        ch_duration = ch_end - ch_start
+        chunk_id = f"chunk_{i:03d}"
+        out_path = original_dir / f"{chunk_id}.mp4"
+        split_chunk(video_path, ch_start, ch_duration, str(out_path))
+        rel_path = f"{video_id}/original/{chunk_id}.mp4"
+        chunks.append({
+            "chunkId": chunk_id,
+            "startTime": ch_start,
+            "endTime": ch_end,
+            "originalPath": rel_path,
+            "processedPath": "",
+            "status": "pending",
+            "processedAt": "",
+            "errorMessage": "",
+        })
+        logger.info("切分完成: %s (%.1f - %.1f秒)", chunk_id, ch_start, ch_end)
+    metadata = {
+        "videoId": video_id,
+        "originalPath": video_path,
+        "chunkSize": chunk_size_sec,
+        "totalChunks": total_chunks,
+        "chunks": chunks,
+        "createdAt": __import__("datetime").datetime.utcnow().isoformat() + "Z",
+    }
+    meta_path = chunk_dir / "metadata.json"
+    with open(meta_path, "w", encoding="utf-8") as f:
+        __import__("json").dump(metadata, f, indent=2, ensure_ascii=False)
+    metadata["_metadataPath"] = str(meta_path)
+    return metadata, video_id
+def merge_chunks(metadata: dict, start_time: float, end_time: float, output_path: str) -> bool:
+    """合并已处理的 chunk（按 startTime 排序，concat + 可选 trim）。"""
+    chunks = metadata.get("chunks") or []
+    processed = [c for c in chunks if c.get("status") == "processed" and c.get("processedPath")]
+    processed = [c for c in processed if Path(c["processedPath"]).exists()]
+    if not processed:
+        raise ValueError("没有可用的已处理小块")
+    processed.sort(key=lambda c: c["startTime"])
+    with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as f:
+        for c in processed:
+            f.write(f"file '{Path(c['processedPath']).resolve()}'\n")
+        list_path = f.name
+    try:
+        out_path = Path(output_path)
+        tmp_concat = out_path.parent / f"chunk_merge_{os.getpid()}.mp4"
+        tmp_trim = out_path.parent / f"chunk_trim_{os.getpid()}.mp4"
+        try:
+            merge_with_concat(list_path, str(tmp_concat))
+            first_start = processed[0]["startTime"]
+            trim_start = max(0, start_time - first_start)
+            exact_duration = end_time - start_time
+            trim_video(str(tmp_concat), trim_start, exact_duration, str(tmp_trim))
+            tmp_trim.replace(out_path)
+        finally:
+            tmp_concat.unlink(missing_ok=True)
+            tmp_trim.unlink(missing_ok=True)
+        return True
+    finally:
+        os.unlink(list_path)

videoconverter_worker-1.0.0/metadata.py ADDED Viewed

@@ -0,0 +1,75 @@
+# -*- coding: utf-8 -*-
+"""
+metadata.json 读写与去字幕后更新（含跨进程文件锁，与 Java MarkerExportService 一致）。
+"""
+import json
+import logging
+from pathlib import Path
+from typing import List, Dict, Any
+logger = logging.getLogger(__name__)
+def load_metadata(metadata_path: str) -> Dict[str, Any]:
+    with open(metadata_path, "r", encoding="utf-8") as f:
+        data = json.load(f)
+    return data
+def save_metadata(metadata_path: str, data: dict) -> None:
+    tmp = metadata_path + ".tmp"
+    with open(tmp, "w", encoding="utf-8") as f:
+        json.dump(data, f, indent=2, ensure_ascii=False)
+    Path(tmp).replace(metadata_path)
+def get_processed_chunks(data: dict) -> List[Dict[str, Any]]:
+    chunks = data.get("chunks") or []
+    return [c for c in chunks if c.get("status") == "processed"]
+def get_pending_chunks(data: dict) -> List[Dict[str, Any]]:
+    chunks = data.get("chunks") or []
+    return [c for c in chunks if c.get("status") in ("pending", "failed")]
+def update_chunk_processed(metadata_path: str, chunk_id: str, processed_path: str) -> None:
+    """将指定 chunk 标记为已处理并写入 processedPath；同目录校验 + 文件锁防串片与并发丢失。"""
+    meta_path = Path(metadata_path)
+    processed_path_obj = Path(processed_path)
+    meta_dir = meta_path.parent.resolve()
+    processed_dir = processed_path_obj.parent.resolve()
+    if meta_dir != processed_dir:
+        logger.error("串片防护: processedPath 与 metadata 不在同一目录, meta_dir=%s, processed_dir=%s, chunk_id=%s",
+                     meta_dir, processed_dir, chunk_id)
+        raise ValueError("processedPath 必须与 metadata.json 同目录，防止串片")
+    lock_path = Path(metadata_path + ".lock")
+    lock_path.parent.mkdir(parents=True, exist_ok=True)
+    try:
+        import fcntl
+        _use_flock = True
+    except ImportError:
+        _use_flock = False  # Windows 无 fcntl，单进程或接受并发风险
+    def _do_update():
+        data = load_metadata(metadata_path)
+        for chunk in data.get("chunks") or []:
+            if chunk.get("chunkId") == chunk_id:
+                chunk["processedPath"] = processed_path
+                chunk["status"] = "processed"
+                chunk["processedAt"] = __import__("datetime").datetime.utcnow().isoformat() + "Z"
+                save_metadata(metadata_path, data)
+                logger.info("已更新 metadata 中 chunk %s 为已处理: %s", chunk_id, processed_path)
+                return
+        logger.warning("未在 metadata 中找到 chunk: %s", chunk_id)
+    if _use_flock:
+        with open(lock_path, "w") as lock_file:
+            fcntl.flock(lock_file.fileno(), fcntl.LOCK_EX)
+            try:
+                _do_update()
+            finally:
+                fcntl.flock(lock_file.fileno(), fcntl.LOCK_UN)
+    else:
+        _do_update()

videoconverter_worker-1.0.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,29 @@
+[build-system]
+requires = ["setuptools>=61.0"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "videoconverter-worker"
+version = "1.0.0"
+description = "VideoConverter Python Worker：从 queue 目录读取任务并执行切分/去字幕/合成"
+readme = "README.txt"
+requires-python = ">=3.8"
+license = { text = "MIT" }
+keywords = ["videoconverter", "ffmpeg", "worker", "video"]
+classifiers = [
+    "License :: OSI Approved :: MIT License",
+    "Operating System :: OS Independent",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.8",
+    "Programming Language :: Python :: 3.9",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+]
+[project.scripts]
+videoconverter = "worker:main"
+videoconverter-worker = "worker:main"
+[tool.setuptools]
+py-modules = ["worker", "task_queue", "metadata", "ffmpeg_runner"]

videoconverter_worker-1.0.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

videoconverter_worker-1.0.0/task_queue.py ADDED Viewed

@@ -0,0 +1,244 @@
+# -*- coding: utf-8 -*-
+"""
+队列文件读写与任务抢占，与 queue_task_schema.txt 约定一致。
+（模块名 task_queue 避免与标准库 queue 冲突）
+"""
+import json
+import os
+import uuid
+import logging
+from pathlib import Path
+from typing import Optional, List, Dict, Any
+logger = logging.getLogger(__name__)
+QUEUE_DIR_NAME = "queue"
+CONFIG_FILE_NAME = "queue_config.json"
+def _format_time(seconds: float) -> str:
+    if seconds < 0:
+        return "0"
+    if seconds < 3600:
+        m = int(seconds // 60)
+        s = seconds % 60
+        return f"{m}:{s:05.2f}" if s != int(s) else f"{m}:{int(s):02d}"
+    h = int(seconds // 3600)
+    m = int((seconds % 3600) // 60)
+    s = seconds % 60
+    return f"{h}:{m:02d}:{s:05.2f}" if s != int(s) else f"{h}:{m:02d}:{int(s):02d}"
+class QueueStore:
+    def __init__(self, data_dir: str, path_replace: Optional[tuple] = None):
+        """
+        :param data_dir: 数据目录，如 ~/.videoconverter/data
+        :param path_replace: (old_prefix, new_prefix) 将任务中的路径前缀替换，便于本机/服务器迁移
+        """
+        self.data_dir = Path(data_dir).resolve()
+        self.queue_dir = self.data_dir / QUEUE_DIR_NAME
+        self.config_file = self.data_dir / CONFIG_FILE_NAME
+        self.path_replace = path_replace  # (old, new)
+        self._ensure_dirs()
+    def _ensure_dirs(self) -> None:
+        self.queue_dir.mkdir(parents=True, exist_ok=True)
+        if not self.config_file.exists():
+            self._write_config({"queue_paused": "true"})
+    def _apply_path(self, path: str) -> str:
+        if not path or not self.path_replace:
+            return path
+        old, new = self.path_replace
+        if path.startswith(old):
+            return new + path[len(old):].lstrip(os.sep)
+        return path
+    def _read_config(self) -> dict:
+        if not self.config_file.exists():
+            return {}
+        try:
+            with open(self.config_file, "r", encoding="utf-8") as f:
+                return json.load(f)
+        except Exception:
+            return {}
+    def _write_config(self, config: dict) -> None:
+        tmp = self.config_file.with_suffix(self.config_file.suffix + ".tmp")
+        with open(tmp, "w", encoding="utf-8") as f:
+            json.dump(config, f, indent=2, ensure_ascii=False)
+        tmp.replace(self.config_file)
+    def is_paused(self) -> bool:
+        cfg = self._read_config()
+        return cfg.get("queue_paused", "true") == "true"
+    def set_paused(self, paused: bool) -> None:
+        cfg = self._read_config()
+        cfg["queue_paused"] = "true" if paused else "false"
+        self._write_config(cfg)
+    def _task_file(self, task_id: str) -> Path:
+        return self.queue_dir / f"{task_id}.json"
+    def _lock_file(self, task_id: str) -> Path:
+        return self.queue_dir / f"{task_id}.lock"
+    def _log_file(self, task_id: str) -> Path:
+        return self.queue_dir / f"{task_id}.log"
+    def _list_task_files(self) -> List[Path]:
+        if not self.queue_dir.exists():
+            return []
+        return sorted(self.queue_dir.glob("*.json"), key=lambda p: p.name)
+    def acquire_pending_task(self) -> Optional[Dict[str, Any]]:
+        """原子抢占一个 PENDING 任务，返回任务 dict 或 None。"""
+        files = self._list_task_files()
+        pending_with_time = []
+        for p in files:
+            try:
+                with open(p, "r", encoding="utf-8") as f:
+                    data = json.load(f)
+                if data.get("status") != "PENDING":
+                    continue
+                pending_with_time.append((data.get("created_time", 0), p))
+            except Exception as e:
+                logger.warning("读取任务文件失败 %s: %s", p.name, e)
+        pending_with_time.sort(key=lambda x: x[0])
+        for _, path in pending_with_time:
+            task_id = path.stem
+            lock_path = self._lock_file(task_id)
+            try:
+                with open(lock_path, "x"):
+                    pass
+            except FileExistsError:
+                continue
+            try:
+                with open(path, "r", encoding="utf-8") as f:
+                    data = json.load(f)
+                if data.get("status") != "PENDING":
+                    lock_path.unlink(missing_ok=True)
+                    continue
+                data["status"] = "PROCESSING"
+                data["start_time"] = int(__import__("time").time() * 1000)
+                with open(path, "w", encoding="utf-8") as f:
+                    json.dump(data, f, indent=2, ensure_ascii=False)
+                self._apply_paths_to_task(data)
+                logger.debug("抢占任务: %s", task_id)
+                return data
+            except Exception as e:
+                logger.exception("抢占任务失败 %s: %s", task_id, e)
+                raise
+            finally:
+                lock_path.unlink(missing_ok=True)
+        return None
+    def _apply_paths_to_task(self, task: dict) -> None:
+        for key in ("input_file", "output_dir"):
+            if key in task and task[key]:
+                task[key] = self._apply_path(task[key])
+        cfg = task.get("config") or {}
+        for key in ("inputPath", "outputPath"):
+            if key in cfg and cfg[key]:
+                cfg[key] = self._apply_path(cfg[key])
+        task["config"] = cfg
+    def update_progress(self, task_id: str, progress: float, progress_text: str) -> None:
+        self._update_task(task_id, {
+            "progress": max(0, min(100, progress)),
+            "progress_text": progress_text or "",
+        })
+    def complete_task(self, task_id: str) -> None:
+        self._update_task(task_id, {
+            "status": "COMPLETED",
+            "progress": 100.0,
+            "progress_text": "已完成",
+            "end_time": int(__import__("time").time() * 1000),
+        })
+    def fail_task(self, task_id: str, error_message: str) -> None:
+        self._update_task(task_id, {
+            "status": "FAILED",
+            "end_time": int(__import__("time").time() * 1000),
+            "error_message": error_message or "",
+        })
+    def _update_task(self, task_id: str, updates: dict) -> None:
+        path = self._task_file(task_id)
+        if not path.exists():
+            return
+        with open(path, "r", encoding="utf-8") as f:
+            data = json.load(f)
+        data.update(updates)
+        tmp = path.with_suffix(path.suffix + ".tmp")
+        with open(tmp, "w", encoding="utf-8") as f:
+            json.dump(data, f, indent=2, ensure_ascii=False)
+        tmp.replace(path)
+    def add_log(self, task_id: str, level: str, message: str) -> None:
+        log_path = self._log_file(task_id)
+        line = f"{int(__import__('time').time()*1000)}\t{level}\t{message}\n"
+        with open(log_path, "a", encoding="utf-8") as f:
+            f.write(line)
+    def get_tasks_by_video_id(self, video_id: str) -> List[Dict[str, Any]]:
+        out = []
+        for p in self._list_task_files():
+            try:
+                with open(p, "r", encoding="utf-8") as f:
+                    d = json.load(f)
+                if d.get("video_id") == video_id:
+                    self._apply_paths_to_task(d)
+                    out.append(d)
+            except Exception:
+                pass
+        out.sort(key=lambda x: x.get("created_time", 0))
+        return out
+    def create_desubtitle_task(self, input_file: str, output_dir: str, config: dict,
+                               parent_task_id: str, video_id: str) -> str:
+        task_id = str(uuid.uuid4())
+        task = {
+            "task_id": task_id,
+            "task_type": "DESUBTITLE",
+            "parent_task_id": parent_task_id or "",
+            "video_id": video_id or "",
+            "input_file": input_file,
+            "output_dir": output_dir,
+            "config": dict(config) if config else {},
+            "status": "PENDING",
+            "progress": 0.0,
+            "progress_text": "等待去字幕...",
+            "created_time": int(__import__("time").time() * 1000),
+            "error_message": "",
+        }
+        path = self._task_file(task_id)
+        with open(path, "w", encoding="utf-8") as f:
+            json.dump(task, f, indent=2, ensure_ascii=False)
+        logger.info("创建去字幕任务: %s -> %s", input_file, task_id)
+        return task_id
+    def create_merge_task(self, video_id: str, output_dir: str, config: dict) -> str:
+        task_id = str(uuid.uuid4())
+        task = {
+            "task_id": task_id,
+            "task_type": "MERGE",
+            "parent_task_id": "",
+            "video_id": video_id,
+            "input_file": "",
+            "output_dir": output_dir,
+            "config": dict(config) if config else {},
+            "status": "PENDING",
+            "progress": 0.0,
+            "progress_text": "等待合成...",
+            "created_time": int(__import__("time").time() * 1000),
+            "error_message": "",
+        }
+        path = self._task_file(task_id)
+        with open(path, "w", encoding="utf-8") as f:
+            json.dump(task, f, indent=2, ensure_ascii=False)
+        logger.info("创建合成任务: videoId=%s -> %s", video_id, task_id)
+        return task_id

videoconverter_worker-1.0.0/videoconverter_worker.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,81 @@
+Metadata-Version: 2.4
+Name: videoconverter-worker
+Version: 1.0.0
+Summary: VideoConverter Python Worker：从 queue 目录读取任务并执行切分/去字幕/合成
+License: MIT
+Keywords: videoconverter,ffmpeg,worker,video
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Requires-Python: >=3.8
+Description-Content-Type: text/plain
+Python Worker 使用说明
+0) 安装（任选其一）
+   - 从 PyPI 安装（推荐，Windows/Linux/macOS 通用）：
+     pip install videoconverter
+     安装后直接运行： videoconverter [--data-dir DIR] [--path-replace OLD=NEW]
+   - 或从本地源码安装： cd src/python && pip install .
+1) 作用
+   与 Java BackendWorker 行为一致：从 queue 目录读取任务（queue/<task_id>.json），
+   执行 SPLIT / DESUBTITLE / MERGE / ONE_CLICK_COMPOSE，更新状态与 metadata.json。
+   适合在服务器上运行，利用多核与高性能做切分、去字幕、合成。
+2) 环境
+   - Python 3.8+
+   - 系统已安装 ffmpeg、ffprobe（或在 PATH 中）
+   - 可选：环境变量 FFMPEG_PATH、FFPROBE_PATH 指定路径
+3) 本机与 Java 共用同一队列（同一台机器）
+   - 数据目录一致即可，例如: ~/.videoconverter/data
+   - 先由 Java 前端创建任务（写 queue/*.json），再在本机运行 Python worker 处理：
+     cd src/python
+     python worker.py
+   - 或指定数据目录：
+     python worker.py --data-dir /path/to/.videoconverter/data
+4) 把「前端处理过的文件夹」拷到服务器后运行
+   - 将本机用于队列的「数据目录」整个拷到服务器（例如 /server/data），
+     其中应包含 queue/ 目录及 queue/*.json 任务文件。
+   - 若任务 JSON 里的路径是本机绝对路径（如 /Users/me/videos/a.mp4），
+     需要在服务器上做路径替换，否则找不到文件：
+     python worker.py --data-dir /server/data --path-replace "/Users/me/videos=/server/videos"
+   - 环境变量方式（便于脚本/ systemd）：
+     export VIDEOCONVERTER_DATA_DIR=/server/data
+     export VIDEOCONVERTER_PATH_REPLACE="/Users/me/videos=/server/videos"
+     python worker.py
+   - 建议：在服务器上把视频放在固定目录（如 /server/videos），
+     拷过去的 data 里 queue 的 JSON 中路径统一用本机前缀，用 --path-replace 换成服务器前缀。
+5) 多进程并发
+   当前 worker 为单进程单线程循环。要跑满多核，可在同一 data-dir 下启动多个进程：
+   - 任务抢占通过 queue/<task_id>.lock 原子创建，多进程不会抢到同一任务。
+   - 示例（4 个 worker 进程）：
+     for i in 1 2 3 4; do python worker.py --data-dir /server/data & done
+     或使用 systemd/supervisor 起多个 worker 实例。
+6) 暂停/继续
+   队列暂停由 queue_config.json 的 queue_paused 控制（"true" 为暂停）。
+   Python worker 会定期读该配置，为 true 时不取新任务。可由 Java 前端或手动改该文件控制。
+7) 与 Java 的约定
+   - 任务与 config 格式见项目根目录 queue_task_schema.txt。
+   - metadata.json 与 Java 生成的格式一致，合成（MERGE）以 metadata 为准。
+8) 打包与部署
+   - 打 zip 包（拷贝到服务器解压即用）：
+     cd src/python
+     ./build_deploy.sh
+     会生成 videoconverter.zip，解压后在该目录执行：
+     python3 worker.py [--data-dir DIR] [--path-replace OLD=NEW]
+   - 或安装为命令行（本机/服务器均可）：
+     cd src/python
+     pip install .
+     然后可直接运行： videoconverter [--data-dir DIR] [--path-replace OLD=NEW]

videoconverter_worker-1.0.0/videoconverter_worker.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,11 @@
+README.txt
+ffmpeg_runner.py
+metadata.py
+pyproject.toml
+task_queue.py
+worker.py
+videoconverter_worker.egg-info/PKG-INFO
+videoconverter_worker.egg-info/SOURCES.txt
+videoconverter_worker.egg-info/dependency_links.txt
+videoconverter_worker.egg-info/entry_points.txt
+videoconverter_worker.egg-info/top_level.txt

videoconverter_worker-1.0.0/videoconverter_worker.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

videoconverter_worker-1.0.0/videoconverter_worker.egg-info/entry_points.txt ADDED Viewed

@@ -0,0 +1,3 @@
+[console_scripts]
+videoconverter = worker:main
+videoconverter-worker = worker:main

videoconverter_worker-1.0.0/videoconverter_worker.egg-info/top_level.txt ADDED Viewed

@@ -0,0 +1,4 @@
+ffmpeg_runner
+metadata
+task_queue
+worker

videoconverter_worker-1.0.0/worker.py ADDED Viewed

@@ -0,0 +1,266 @@
+# -*- coding: utf-8 -*-
+"""
+Python Worker：从 queue 目录读取任务并执行切分/去字幕/合成，与 Java BackendWorker 行为一致。
+用法:
+  python worker.py [--data-dir DIR] [--path-replace OLD=NEW] [--workers N]
+  或设置环境变量: VIDEOCONVERTER_DATA_DIR, VIDEOCONVERTER_PATH_REPLACE (OLD=NEW)
+"""
+import argparse
+import logging
+import sys
+import time
+from pathlib import Path
+from task_queue import QueueStore
+from metadata import load_metadata, get_processed_chunks, get_pending_chunks, update_chunk_processed
+from ffmpeg_runner import (
+    split_video_to_chunks,
+    run_desubtitle,
+    merge_chunks,
+    get_duration,
+)
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+)
+logger = logging.getLogger("worker")
+def process_split_task(store: QueueStore, task: dict) -> None:
+    task_id = task["task_id"]
+    input_file = task["input_file"]
+    output_dir = task["output_dir"]
+    config = task.get("config") or {}
+    range_start = config.get("startTime", 0) or 0
+    range_end = config.get("endTime", 0) or 0
+    store.add_log(task_id, "INFO", f"开始切分视频: {Path(input_file).name} (范围: {range_start} - {range_end}秒)")
+    metadata, video_id = split_video_to_chunks(input_file, output_dir, 120.0, range_start, range_end)
+    store.add_log(task_id, "INFO", f"切分完成: {metadata['totalChunks']} 个小块，元数据: {metadata.get('_metadataPath', '')}")
+    store.complete_task(task_id)
+    # 为每个 chunk 创建去字幕任务
+    chunk_files = []
+    for ch in metadata.get("chunks") or []:
+        rel = ch.get("originalPath", "")
+        if rel:
+            chunk_path = Path(output_dir) / rel
+            chunk_files.append(str(chunk_path))
+    for chunk_file in chunk_files:
+        store.create_desubtitle_task(chunk_file, output_dir, config, task_id, video_id)
+    store.add_log(task_id, "INFO", f"已创建 {len(chunk_files)} 个去字幕任务")
+def process_desubtitle_task(store: QueueStore, task: dict) -> None:
+    task_id = task["task_id"]
+    input_file = task["input_file"]
+    output_dir = task["output_dir"]
+    video_id = task.get("video_id") or ""
+    config = dict(task.get("config") or {})
+    input_path = Path(input_file)
+    if not input_path.exists():
+        store.fail_task(task_id, f"输入文件不存在: {input_file}")
+        return
+    store.add_log(task_id, "INFO", f"开始去字幕: {input_path.name}")
+    output_dir_for_file = Path(output_dir) / video_id if video_id else Path(output_dir)
+    output_dir_for_file.mkdir(parents=True, exist_ok=True)
+    output_name = input_path.stem + "_desub.mp4"
+    output_file = output_dir_for_file / output_name
+    config["inputPath"] = input_file
+    config["outputPath"] = str(output_file)
+    if video_id:
+        config["startTime"] = 0
+        config["endTime"] = 0
+        config["forceKeyframeAtStart"] = True
+    try:
+        run_desubtitle(config, input_file, str(output_file))
+    except Exception as e:
+        store.fail_task(task_id, str(e))
+        store.add_log(task_id, "WARN", str(e))
+        return
+    store.complete_task(task_id)
+    store.add_log(task_id, "INFO", f"去字幕完成: {output_name}")
+    if video_id:
+        metadata_path = Path(output_dir) / video_id / "metadata.json"
+        if metadata_path.exists():
+            chunk_id = input_path.stem
+            try:
+                update_chunk_processed(str(metadata_path), chunk_id, str(output_file))
+                check_and_create_merge_task(store, video_id, output_dir, config)
+            except Exception as e:
+                logger.warning("更新 metadata 或检查合成失败: videoId=%s, chunkId=%s, %s", video_id, chunk_id, e)
+def check_and_create_merge_task(store: QueueStore, video_id: str, output_dir: str, config: dict) -> None:
+    metadata_path = Path(output_dir) / video_id / "metadata.json"
+    if not metadata_path.exists():
+        return
+    data = load_metadata(str(metadata_path))
+    chunks = data.get("chunks") or []
+    total = len(chunks)
+    processed = get_processed_chunks(data)
+    pending = get_pending_chunks(data)
+    existing = store.get_tasks_by_video_id(video_id)
+    has_merge = any(t.get("task_type") == "MERGE" for t in existing)
+    if total > 0 and len(processed) == total and not pending and not has_merge:
+        store.create_merge_task(video_id, output_dir, config)
+        logger.info("自动创建合成任务: videoId=%s, 已处理 %d/%d 块", video_id, len(processed), total)
+def process_merge_task(store: QueueStore, task: dict) -> None:
+    task_id = task["task_id"]
+    video_id = task.get("video_id")
+    output_dir = task["output_dir"]
+    store.add_log(task_id, "INFO", f"开始合成视频: videoId={video_id}")
+    metadata_path = Path(output_dir) / video_id / "metadata.json"
+    if not metadata_path.exists():
+        store.fail_task(task_id, f"元数据文件不存在: {metadata_path}")
+        return
+    data = load_metadata(str(metadata_path))
+    processed = get_processed_chunks(data)
+    if not processed:
+        store.fail_task(task_id, "没有已处理的 chunk")
+        return
+    start_time = processed[0]["startTime"]
+    end_time = processed[-1]["endTime"]
+    output_file = Path(output_dir) / f"{video_id}_merged.mp4"
+    store.add_log(task_id, "INFO", f"合并 {len(processed)} 个小块，时间范围: {start_time} - {end_time}秒")
+    try:
+        merge_chunks(data, start_time, end_time, str(output_file))
+        store.complete_task(task_id)
+        store.add_log(task_id, "INFO", f"合成完成: {output_file.name}")
+    except Exception as e:
+        store.fail_task(task_id, str(e))
+        store.add_log(task_id, "WARN", str(e))
+def process_one_click_compose_task(store: QueueStore, task: dict) -> None:
+    task_id = task["task_id"]
+    input_file = task["input_file"]
+    output_dir = task["output_dir"]
+    config = task.get("config") or {}
+    range_start = config.get("startTime", 0) or 0
+    range_end = config.get("endTime", 0) or 0
+    store.add_log(task_id, "INFO", f"开始一键合成: {Path(input_file).name} (videoId={task.get('video_id', '')})")
+    metadata, folder_video_id = split_video_to_chunks(input_file, output_dir, 120.0, range_start, range_end)
+    store.add_log(task_id, "INFO", f"切分完成: {metadata['totalChunks']} 个小块")
+    store.complete_task(task_id)
+    chunk_files = []
+    for ch in metadata.get("chunks") or []:
+        rel = ch.get("originalPath", "")
+        if rel:
+            chunk_files.append(str(Path(output_dir) / rel))
+    for cf in chunk_files:
+        store.create_desubtitle_task(cf, output_dir, config, task_id, folder_video_id)
+    store.add_log(task_id, "INFO", f"已创建 {len(chunk_files)} 个去字幕任务")
+def work_loop(store: QueueStore) -> None:
+    while True:
+        try:
+            if store.is_paused():
+                time.sleep(1)
+                continue
+            task = store.acquire_pending_task()
+            if task is None:
+                time.sleep(1)
+                continue
+            task_id = task["task_id"]
+            task_type = task.get("task_type", "")
+            logger.info("开始处理任务: %s (类型: %s)", task_id, task_type)
+            try:
+                if task_type == "SPLIT":
+                    process_split_task(store, task)
+                elif task_type == "DESUBTITLE":
+                    process_desubtitle_task(store, task)
+                elif task_type == "MERGE":
+                    process_merge_task(store, task)
+                elif task_type == "ONE_CLICK_COMPOSE":
+                    process_one_click_compose_task(store, task)
+                else:
+                    store.fail_task(task_id, f"未知任务类型: {task_type}")
+            except Exception as e:
+                logger.exception("任务处理失败: %s", task_id)
+                store.fail_task(task_id, str(e))
+                store.add_log(task_id, "WARN", str(e))
+        except KeyboardInterrupt:
+            logger.info("收到中断，退出")
+            break
+        except Exception as e:
+            logger.exception("工作循环异常: %s", e)
+            time.sleep(5)
+def main() -> int:
+    parser = argparse.ArgumentParser(description="VideoConverter Python Worker")
+    parser.add_argument(
+        "--data-dir",
+        default=None,
+        help="数据目录（默认 $VIDEOCONVERTER_DATA_DIR 或 ~/.videoconverter/data）",
+    )
+    parser.add_argument(
+        "--path-replace",
+        default=None,
+        help="路径前缀替换，便于本机任务拷到服务器: OLD=NEW，如 /Users/me/videos=/data/videos",
+    )
+    parser.add_argument(
+        "--workers",
+        type=int,
+        default=1,
+        help="并发 worker 数（默认 1，多进程时由外部起多个 worker 进程）",
+    )
+    args = parser.parse_args()
+    data_dir = args.data_dir or __import__("os").environ.get(
+        "VIDEOCONVERTER_DATA_DIR",
+        str(Path.home() / ".videoconverter" / "data"),
+    )
+    path_replace = None
+    if args.path_replace:
+        if "=" in args.path_replace:
+            a, b = args.path_replace.split("=", 1)
+            path_replace = (a.strip(), b.strip())
+    env_replace = __import__("os").environ.get("VIDEOCONVERTER_PATH_REPLACE")
+    if env_replace and "=" in env_replace and path_replace is None:
+        a, b = env_replace.split("=", 1)
+        path_replace = (a.strip(), b.strip())
+    store = QueueStore(data_dir, path_replace=path_replace)
+    logger.info("数据目录: %s", data_dir)
+    logger.info("队列目录: %s", store.queue_dir)
+    if path_replace:
+        logger.info("路径替换: %s -> %s", path_replace[0], path_replace[1])
+    work_loop(store)
+    return 0
+if __name__ == "__main__":
+    sys.exit(main())