PyPI - videopython - Versions diffs - 0.30.0__tar.gz → 0.31.0__tar.gz - Mend

videopython 0.30.0tar.gz → 0.31.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

{videopython-0.30.0 → videopython-0.31.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: videopython
-Version: 0.30.0
+Version: 0.31.0
 Summary: Minimal video generation and processing library.
 Project-URL: Homepage, https://videopython.com
 Project-URL: Repository, https://github.com/bartwojtowicz/videopython/
@@ -85,22 +85,31 @@ Python `>=3.10, <3.14`. AI features run locally - no cloud API keys required, bu
 ## Quick Start
-### Video editing
+### Imperative editing
+Every editing primitive is an `Operation` subclass — a Pydantic model
+whose fields ARE the JSON wire format. Apply one to a `Video`:
+```python
+from videopython.base import Video, CutSeconds, Resize, Fade
+video = Video.from_path("raw.mp4")
+video = CutSeconds(start=10, end=25).apply(video)
+video = Resize(width=1080, height=1920).apply(video)
+video = Fade(mode="in", duration=0.5).apply(video)
+video.save("output.mp4")
+```
+Concatenate clips with `+` (must share fps + dimensions):
 ```python
-from videopython import Video
-from videopython.base import FadeTransition
-intro = Video.from_path("intro.mp4").resize(1080, 1920)
-clip = Video.from_path("raw.mp4").cut(10, 25).resize(1080, 1920).resample_fps(30)
-final = intro.transition_to(clip, FadeTransition(effect_time_seconds=0.5))
-final = final.add_audio_from_file("music.mp3")
-final.save("output.mp4")
+combined = video_a + video_b
 ```
 ### JSON editing plans
-Define multi-segment edits as JSON - useful for LLM-driven workflows. `VideoEdit.json_schema()` returns a schema for plan generation/validation.
+Define multi-segment edits as JSON — the format LLM-driven workflows
+generate against. `VideoEdit.json_schema()` returns the schema:
 ```python
 from videopython.editing import VideoEdit
@@ -110,68 +119,89 @@ plan = {
         "source": "raw.mp4",
         "start": 10.0,
         "end": 20.0,
-        "transforms": [
-            {"op": "resize", "args": {"height": 1280}},
-            {"op": "speed_change", "args": {"speed": 1.25}},
+        "operations": [
+            {"op": "resize", "width": 1080, "height": 1920},
+            {"op": "color_adjust", "saturation": 1.15, "contrast": 1.05},
+            {"op": "fade", "mode": "in", "duration": 0.5,
+             "window": {"stop": 0.5}},
         ],
     }],
-    "post_effects": [
-        {"op": "fade", "args": {"mode": "in", "duration": 0.5}, "apply": {"start": 0.0, "stop": 0.5}},
-    ],
 }
 edit = VideoEdit.from_dict(plan)
-edit.validate()   # dry-run via metadata (no frame loading)
-final = edit.run()
-final.save("output.mp4")
+edit.validate()                  # dry-run via metadata, no frames loaded
+edit.run_to_file("output.mp4")   # stream to disk, ~constant memory
 ```
+`run_to_file()` pipes ffmpeg decode → per-frame effects → ffmpeg encode,
+so memory stays bounded even for hour-long sources. Use `edit.run()`
+instead if you want the result back in memory as a `Video`.
 ### AI generation
 ```python
 from videopython.ai import TextToImage, ImageToVideo, TextToSpeech
+from videopython.base import Resize
 image = TextToImage().generate_image("A cinematic mountain sunrise")
-video = ImageToVideo().generate_video(image=image).resize(1080, 1920)
+video = ImageToVideo().generate_video(image=image)
 audio = TextToSpeech().generate_audio("Welcome to videopython.")
+video = Resize(width=1080, height=1920).apply(video)
 video.add_audio(audio).save("ai_video.mp4")
 ```
 ## LLM & AI Agent Integration
-videopython is designed to be controlled by LLMs. Every video operation exposes a machine-readable spec with descriptions, parameter types, and value constraints - all available as JSON Schema at runtime.
+The library is built for LLM-driven editing. Two surfaces matter:
-**Schema generation** - `VideoEdit.json_schema()` returns a complete JSON Schema describing valid edit plans. Pass it directly as a tool schema or structured-output format to any LLM API:
+**1. Plan schema for tool / structured-output calls.**
+`VideoEdit.json_schema()` returns a JSON Schema covering segments,
+`post_operations`, and a discriminated union over every registered
+`Operation`. Drop it into any LLM API:
 ```python
 from videopython.editing import VideoEdit
 schema = VideoEdit.json_schema()
-# Pass `schema` to your LLM as a function/tool definition or response format.
-# The LLM generates a plan dict, then:
+# Anthropic: tools=[{"name": "edit", "input_schema": schema}]
+# OpenAI:    tools=[{"type": "function",
+#                    "function": {"name": "edit", "parameters": schema}}]
+```
+Validate the LLM's output without touching the filesystem, then run it:
+```python
 edit = VideoEdit.from_dict(plan)
-edit.validate()   # dry-run: checks sources, time ranges, params - no frames loaded
-final = edit.run()
-final.save("output.mp4")
+edit.validate()                  # catches bad ops, time ranges, fps mismatches
+edit.run_to_file("output.mp4")
 ```
-**Operation discovery** - the registry lets an LLM (or your code) inspect all available operations, their parameters, and constraints:
+**2. Operation discovery for agent loops.**
+Every registered op exposes its own Pydantic schema, so an agent can
+introspect what's available without hardcoded lists:
 ```python
-from videopython.base import get_operation_specs, get_specs_by_category, OperationCategory
+from videopython.base import Operation, OpCategory
-all_ops = get_operation_specs()                                    # all registered operations
-transforms = get_specs_by_category(OperationCategory.TRANSFORMATION)  # just transforms
+for op_id, cls in Operation.registry().items():
+    print(f"{op_id}: {(cls.__doc__ or '').splitlines()[0]}")
-spec = all_ops["color_adjust"]
-print(spec.description)       # LLM-friendly docstring
-print(spec.to_json_schema())  # {"brightness": {"type": "number", "minimum": -1, "maximum": 1}, ...}
+schema = Operation.get("color_adjust").model_json_schema()  # per-op schema
 ```
-Every operation has LLM-optimized descriptions and rich constraints (`minimum`, `maximum`, `enum`, `exclusive_minimum`, etc.) so models generate valid parameters on the first try.
+Field constraints (`minimum`, `maximum`, `enum`, `exclusiveMinimum`,
+nullability) flow through to the schema, so LLMs that support
+constrained generation produce valid parameters on the first try.
+For ops that need side-channel data (e.g. `silence_removal` and
+`add_subtitles` need a `Transcription`), pass it via `context`:
+```python
+edit.run(context={"transcription": my_transcription})
+```
-Docs: [Editing Plans](https://videopython.com/api/editing/) | [Operation Registry](https://videopython.com/api/registry/)
+Docs: [Editing Plans](https://videopython.com/api/editing/) | [Operations](https://videopython.com/api/operations/) | [LLM Integration Guide](https://videopython.com/guides/llm-integration/)
 ## Features
@@ -180,16 +210,15 @@ Docs: [Editing Plans](https://videopython.com/api/editing/) | [Operation Registr
 | Area | Highlights |
 |---|---|
 | **Video I/O** | `Video`, `VideoMetadata`, `FrameIterator` - load, save, inspect |
-| **Editing plans** | `VideoEdit`, `SegmentConfig` - JSON/LLM-friendly multi-segment plans with full JSON Schema generation, dry-run validation, and operation registry |
-| **Multicam editing** | `MultiCamEdit`, `CutPoint` - switch between synchronized camera angles with transitions, replace audio with external track |
-| **Transforms** | Cut (time/frame), resize, crop, FPS resampling, speed change, picture-in-picture, reverse, freeze frame, silence removal |
-| **Transitions** | `FadeTransition`, `BlurTransition`, `InstantTransition` |
+| **Operation foundation** | `Operation`, `Effect`, `TimeRange`, `OpCategory` - Pydantic base + auto-registry + discriminated-union schema |
+| **Editing plans** | `VideoEdit`, `SegmentConfig` - JSON/LLM-friendly multi-segment plans with JSON Schema generation, dry-run validation, and streaming `run_to_file` |
+| **Transforms** | Cut (time/frame), resize, crop, FPS resampling, speed change, reverse, freeze frame, silence removal |
 | **Effects** | Blur, zoom, color grading, vignette, Ken Burns, image overlay, fade, text overlay, volume adjust |
 | **Audio** | Load/save, overlay, concat, normalize, time-stretch, silence detection, segment classification |
 | **Text** | Transcription data classes, `TranscriptionOverlay` for subtitle rendering |
 | **Scene detection** | Histogram-based scene boundaries (`detect`, `detect_streaming`, `detect_parallel`) |
-API docs: [Core](https://videopython.com/api/index/) | [Video](https://videopython.com/api/core/video/) | [Audio](https://videopython.com/api/core/audio/) | [Editing Plans](https://videopython.com/api/editing/) | [Transforms](https://videopython.com/api/transforms/) | [Transitions](https://videopython.com/api/transitions/) | [Effects](https://videopython.com/api/effects/) | [Text](https://videopython.com/api/text/)
+API docs: [Core](https://videopython.com/api/index/) | [Video](https://videopython.com/api/core/video/) | [Audio](https://videopython.com/api/core/audio/) | [Editing Plans](https://videopython.com/api/editing/) | [Operations](https://videopython.com/api/operations/) | [Transforms](https://videopython.com/api/transforms/) | [Effects](https://videopython.com/api/effects/) | [Text](https://videopython.com/api/text/)
 ### `videopython.ai` - local AI features (install with `[ai]`)
@@ -199,7 +228,7 @@ API docs: [Core](https://videopython.com/api/index/) | [Video](https://videopyth
 | **Understanding** | `AudioToText` (transcription), `AudioClassifier`, `SceneVLM` (structured visual scene description), `FaceTracker` (per-shot face tracks) |
 | **Scene detection** | `SemanticSceneDetector` (neural scene boundaries) |
 | **Video analysis** | `VideoAnalyzer` - full-pipeline analysis combining multiple AI capabilities |
-| **Transforms** | `FaceTrackingCrop`, `SplitScreenComposite` |
+| **Transforms** | `FaceTrackingCrop` |
 | **Dubbing** | `VideoDubber` - voice cloning and revoicing with timing sync |
 API docs: [Generation](https://videopython.com/api/ai/generation/) | [Understanding](https://videopython.com/api/ai/understanding/) | [Transforms](https://videopython.com/api/ai/transforms/) | [Dubbing](https://videopython.com/api/ai/dubbing/)

{videopython-0.30.0 → videopython-0.31.0}/README.md RENAMED Viewed

@@ -36,22 +36,31 @@ Python `>=3.10, <3.14`. AI features run locally - no cloud API keys required, bu
 ## Quick Start
-### Video editing
+### Imperative editing
+Every editing primitive is an `Operation` subclass — a Pydantic model
+whose fields ARE the JSON wire format. Apply one to a `Video`:
+```python
+from videopython.base import Video, CutSeconds, Resize, Fade
+video = Video.from_path("raw.mp4")
+video = CutSeconds(start=10, end=25).apply(video)
+video = Resize(width=1080, height=1920).apply(video)
+video = Fade(mode="in", duration=0.5).apply(video)
+video.save("output.mp4")
+```
+Concatenate clips with `+` (must share fps + dimensions):
 ```python
-from videopython import Video
-from videopython.base import FadeTransition
-intro = Video.from_path("intro.mp4").resize(1080, 1920)
-clip = Video.from_path("raw.mp4").cut(10, 25).resize(1080, 1920).resample_fps(30)
-final = intro.transition_to(clip, FadeTransition(effect_time_seconds=0.5))
-final = final.add_audio_from_file("music.mp3")
-final.save("output.mp4")
+combined = video_a + video_b
 ```
 ### JSON editing plans
-Define multi-segment edits as JSON - useful for LLM-driven workflows. `VideoEdit.json_schema()` returns a schema for plan generation/validation.
+Define multi-segment edits as JSON — the format LLM-driven workflows
+generate against. `VideoEdit.json_schema()` returns the schema:
 ```python
 from videopython.editing import VideoEdit
@@ -61,68 +70,89 @@ plan = {
         "source": "raw.mp4",
         "start": 10.0,
         "end": 20.0,
-        "transforms": [
-            {"op": "resize", "args": {"height": 1280}},
-            {"op": "speed_change", "args": {"speed": 1.25}},
+        "operations": [
+            {"op": "resize", "width": 1080, "height": 1920},
+            {"op": "color_adjust", "saturation": 1.15, "contrast": 1.05},
+            {"op": "fade", "mode": "in", "duration": 0.5,
+             "window": {"stop": 0.5}},
         ],
     }],
-    "post_effects": [
-        {"op": "fade", "args": {"mode": "in", "duration": 0.5}, "apply": {"start": 0.0, "stop": 0.5}},
-    ],
 }
 edit = VideoEdit.from_dict(plan)
-edit.validate()   # dry-run via metadata (no frame loading)
-final = edit.run()
-final.save("output.mp4")
+edit.validate()                  # dry-run via metadata, no frames loaded
+edit.run_to_file("output.mp4")   # stream to disk, ~constant memory
 ```
+`run_to_file()` pipes ffmpeg decode → per-frame effects → ffmpeg encode,
+so memory stays bounded even for hour-long sources. Use `edit.run()`
+instead if you want the result back in memory as a `Video`.
 ### AI generation
 ```python
 from videopython.ai import TextToImage, ImageToVideo, TextToSpeech
+from videopython.base import Resize
 image = TextToImage().generate_image("A cinematic mountain sunrise")
-video = ImageToVideo().generate_video(image=image).resize(1080, 1920)
+video = ImageToVideo().generate_video(image=image)
 audio = TextToSpeech().generate_audio("Welcome to videopython.")
+video = Resize(width=1080, height=1920).apply(video)
 video.add_audio(audio).save("ai_video.mp4")
 ```
 ## LLM & AI Agent Integration
-videopython is designed to be controlled by LLMs. Every video operation exposes a machine-readable spec with descriptions, parameter types, and value constraints - all available as JSON Schema at runtime.
+The library is built for LLM-driven editing. Two surfaces matter:
-**Schema generation** - `VideoEdit.json_schema()` returns a complete JSON Schema describing valid edit plans. Pass it directly as a tool schema or structured-output format to any LLM API:
+**1. Plan schema for tool / structured-output calls.**
+`VideoEdit.json_schema()` returns a JSON Schema covering segments,
+`post_operations`, and a discriminated union over every registered
+`Operation`. Drop it into any LLM API:
 ```python
 from videopython.editing import VideoEdit
 schema = VideoEdit.json_schema()
-# Pass `schema` to your LLM as a function/tool definition or response format.
-# The LLM generates a plan dict, then:
+# Anthropic: tools=[{"name": "edit", "input_schema": schema}]
+# OpenAI:    tools=[{"type": "function",
+#                    "function": {"name": "edit", "parameters": schema}}]
+```
+Validate the LLM's output without touching the filesystem, then run it:
+```python
 edit = VideoEdit.from_dict(plan)
-edit.validate()   # dry-run: checks sources, time ranges, params - no frames loaded
-final = edit.run()
-final.save("output.mp4")
+edit.validate()                  # catches bad ops, time ranges, fps mismatches
+edit.run_to_file("output.mp4")
 ```
-**Operation discovery** - the registry lets an LLM (or your code) inspect all available operations, their parameters, and constraints:
+**2. Operation discovery for agent loops.**
+Every registered op exposes its own Pydantic schema, so an agent can
+introspect what's available without hardcoded lists:
 ```python
-from videopython.base import get_operation_specs, get_specs_by_category, OperationCategory
+from videopython.base import Operation, OpCategory
-all_ops = get_operation_specs()                                    # all registered operations
-transforms = get_specs_by_category(OperationCategory.TRANSFORMATION)  # just transforms
+for op_id, cls in Operation.registry().items():
+    print(f"{op_id}: {(cls.__doc__ or '').splitlines()[0]}")
-spec = all_ops["color_adjust"]
-print(spec.description)       # LLM-friendly docstring
-print(spec.to_json_schema())  # {"brightness": {"type": "number", "minimum": -1, "maximum": 1}, ...}
+schema = Operation.get("color_adjust").model_json_schema()  # per-op schema
 ```
-Every operation has LLM-optimized descriptions and rich constraints (`minimum`, `maximum`, `enum`, `exclusive_minimum`, etc.) so models generate valid parameters on the first try.
+Field constraints (`minimum`, `maximum`, `enum`, `exclusiveMinimum`,
+nullability) flow through to the schema, so LLMs that support
+constrained generation produce valid parameters on the first try.
+For ops that need side-channel data (e.g. `silence_removal` and
+`add_subtitles` need a `Transcription`), pass it via `context`:
+```python
+edit.run(context={"transcription": my_transcription})
+```
-Docs: [Editing Plans](https://videopython.com/api/editing/) | [Operation Registry](https://videopython.com/api/registry/)
+Docs: [Editing Plans](https://videopython.com/api/editing/) | [Operations](https://videopython.com/api/operations/) | [LLM Integration Guide](https://videopython.com/guides/llm-integration/)
 ## Features
@@ -131,16 +161,15 @@ Docs: [Editing Plans](https://videopython.com/api/editing/) | [Operation Registr
 | Area | Highlights |
 |---|---|
 | **Video I/O** | `Video`, `VideoMetadata`, `FrameIterator` - load, save, inspect |
-| **Editing plans** | `VideoEdit`, `SegmentConfig` - JSON/LLM-friendly multi-segment plans with full JSON Schema generation, dry-run validation, and operation registry |
-| **Multicam editing** | `MultiCamEdit`, `CutPoint` - switch between synchronized camera angles with transitions, replace audio with external track |
-| **Transforms** | Cut (time/frame), resize, crop, FPS resampling, speed change, picture-in-picture, reverse, freeze frame, silence removal |
-| **Transitions** | `FadeTransition`, `BlurTransition`, `InstantTransition` |
+| **Operation foundation** | `Operation`, `Effect`, `TimeRange`, `OpCategory` - Pydantic base + auto-registry + discriminated-union schema |
+| **Editing plans** | `VideoEdit`, `SegmentConfig` - JSON/LLM-friendly multi-segment plans with JSON Schema generation, dry-run validation, and streaming `run_to_file` |
+| **Transforms** | Cut (time/frame), resize, crop, FPS resampling, speed change, reverse, freeze frame, silence removal |
 | **Effects** | Blur, zoom, color grading, vignette, Ken Burns, image overlay, fade, text overlay, volume adjust |
 | **Audio** | Load/save, overlay, concat, normalize, time-stretch, silence detection, segment classification |
 | **Text** | Transcription data classes, `TranscriptionOverlay` for subtitle rendering |
 | **Scene detection** | Histogram-based scene boundaries (`detect`, `detect_streaming`, `detect_parallel`) |
-API docs: [Core](https://videopython.com/api/index/) | [Video](https://videopython.com/api/core/video/) | [Audio](https://videopython.com/api/core/audio/) | [Editing Plans](https://videopython.com/api/editing/) | [Transforms](https://videopython.com/api/transforms/) | [Transitions](https://videopython.com/api/transitions/) | [Effects](https://videopython.com/api/effects/) | [Text](https://videopython.com/api/text/)
+API docs: [Core](https://videopython.com/api/index/) | [Video](https://videopython.com/api/core/video/) | [Audio](https://videopython.com/api/core/audio/) | [Editing Plans](https://videopython.com/api/editing/) | [Operations](https://videopython.com/api/operations/) | [Transforms](https://videopython.com/api/transforms/) | [Effects](https://videopython.com/api/effects/) | [Text](https://videopython.com/api/text/)
 ### `videopython.ai` - local AI features (install with `[ai]`)
@@ -150,7 +179,7 @@ API docs: [Core](https://videopython.com/api/index/) | [Video](https://videopyth
 | **Understanding** | `AudioToText` (transcription), `AudioClassifier`, `SceneVLM` (structured visual scene description), `FaceTracker` (per-shot face tracks) |
 | **Scene detection** | `SemanticSceneDetector` (neural scene boundaries) |
 | **Video analysis** | `VideoAnalyzer` - full-pipeline analysis combining multiple AI capabilities |
-| **Transforms** | `FaceTrackingCrop`, `SplitScreenComposite` |
+| **Transforms** | `FaceTrackingCrop` |
 | **Dubbing** | `VideoDubber` - voice cloning and revoicing with timing sync |
 API docs: [Generation](https://videopython.com/api/ai/generation/) | [Understanding](https://videopython.com/api/ai/understanding/) | [Transforms](https://videopython.com/api/ai/transforms/) | [Dubbing](https://videopython.com/api/ai/dubbing/)

{videopython-0.30.0 → videopython-0.31.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "videopython"
-version = "0.30.0"
+version = "0.31.0"
 description = "Minimal video generation and processing library."
 authors = [
     { name = "Bartosz Wójtowicz", email = "bartoszwojtowicz@outlook.com" },
@@ -136,6 +136,7 @@ Documentation = "https://videopython.com"
 [tool.mypy]
 mypy_path = "src/stubs"
+plugins = ["pydantic.mypy"]
 [[tool.mypy.overrides]]
 module = [

{videopython-0.30.0 → videopython-0.31.0}/src/videopython/ai/__init__.py RENAMED Viewed

@@ -1,7 +1,5 @@
-from videopython.ai import registry as _ai_registry  # noqa: F401
 from .generation import ImageToVideo, TextToImage, TextToMusic, TextToSpeech, TextToVideo
-from .transforms import FaceTrackingCrop, SplitScreenComposite
+from .transforms import FaceTrackingCrop
 from .understanding import (
     AudioClassifier,
     AudioToText,
@@ -26,7 +24,6 @@ __all__ = [
     "SemanticSceneDetector",
     # Transforms (AI-powered)
     "FaceTrackingCrop",
-    "SplitScreenComposite",
     # Video analysis
     "VideoAnalysis",
     "VideoAnalysisConfig",

{videopython-0.30.0 → videopython-0.31.0}/src/videopython/ai/dubbing/dubber.py RENAMED Viewed

@@ -292,7 +292,9 @@ class VideoDubber:
         video_duration = video.total_seconds
         if video_duration > speech_duration:
-            output_video = video.cut(0, speech_duration)
+            from videopython.base.transforms import CutSeconds
+            output_video = CutSeconds(start=0, end=speech_duration).apply(video)
         else:
             output_video = video

videopython-0.31.0/src/videopython/ai/transforms.py ADDED Viewed

@@ -0,0 +1,193 @@
+"""AI-powered video transforms that require face detection."""
+from __future__ import annotations
+import logging
+from typing import ClassVar, Literal
+import cv2
+import numpy as np
+from pydantic import Field
+from tqdm import tqdm
+from videopython.ai.understanding.faces import FaceTracker
+from videopython.base.operation import OpCategory, Operation
+from videopython.base.video import Video
+logger = logging.getLogger(__name__)
+def _make_even(value: int) -> int:
+    """Round down to nearest even number for H.264 compatibility."""
+    return value - (value % 2)
+__all__ = [
+    "FaceTrackingCrop",
+]
+class FaceTrackingCrop(Operation):
+    """Crops video to follow detected faces.
+    Useful for creating vertical (9:16) content from horizontal (16:9) video
+    by tracking the speaker's face and keeping it framed.
+    Supports GPU acceleration for faster processing with optional frame sampling
+    and simple cinematographic framing rules (headroom / thirds) plus optional
+    movement speed clamping.
+    """
+    op: Literal["face_crop"] = "face_crop"
+    category: ClassVar[OpCategory] = OpCategory.TRANSFORM
+    target_aspect: tuple[int, int] = Field((9, 16), description="Output aspect ratio as (width, height).")
+    face_selection: Literal["largest", "centered", "index"] = Field(
+        "largest", description="Strategy for selecting which face to track."
+    )
+    face_index: int = Field(0, ge=0, description='Index of face to track when using ``face_selection="index"``.')
+    padding: float = Field(0.3, ge=0, description="Extra space around face (0.3 = 30% padding on each side).")
+    vertical_offset: float = Field(
+        -0.1, description='Legacy vertical position offset used by ``framing_rule="offset"``.'
+    )
+    framing_rule: Literal["offset", "center", "headroom", "thirds", "dynamic"] = Field(
+        "offset",
+        description=(
+            'Subject framing strategy. "offset": legacy ``vertical_offset`` behavior; '
+            '"center": keep face centered; "headroom": extra room above the face; '
+            '"thirds": face near the upper-third line; "dynamic": currently same as "headroom".'
+        ),
+    )
+    headroom: float = Field(0.15, description="Headroom amount for framing rules that use it.")
+    smoothing: float = Field(0.8, ge=0, le=1, description="Position smoothing factor (0-1, higher = smoother).")
+    max_speed: float | None = Field(None, gt=0, description="Optional max camera movement per frame (normalized).")
+    fallback: Literal["center", "last_position", "full_frame"] = Field(
+        "last_position", description="Behavior when no face detected."
+    )
+    detection_interval: int = Field(3, ge=1, description="Frames between face detections.")
+    backend: Literal["cpu", "gpu", "auto"] = Field("auto", description='Detection backend - "cpu", "gpu", or "auto".')
+    sample_rate: int = Field(1, ge=1, description="For GPU backend, detect every Nth frame and interpolate.")
+    def _apply_framing_offset(self, face_cx: float, face_cy: float, face_h: float) -> tuple[float, float]:
+        if self.framing_rule == "offset":
+            return (face_cx, face_cy + self.vertical_offset)
+        if self.framing_rule == "center":
+            return (face_cx, face_cy)
+        if self.framing_rule == "headroom":
+            return (face_cx, face_cy - self.headroom)
+        if self.framing_rule == "thirds":
+            return (face_cx, face_cy - (1 / 3 - 0.5))
+        # "dynamic" — placeholder until motion/look-direction framing is implemented.
+        return (face_cx, face_cy - self.headroom)
+    def _clamp_speed(self, current: tuple[float, float], target: tuple[float, float]) -> tuple[float, float]:
+        if self.max_speed is None:
+            return target
+        dx = target[0] - current[0]
+        dy = target[1] - current[1]
+        distance = (dx**2 + dy**2) ** 0.5
+        if distance <= self.max_speed or distance == 0:
+            return target
+        scale = self.max_speed / distance
+        return (current[0] + dx * scale, current[1] + dy * scale)
+    def _calculate_crop_region(
+        self,
+        face_cx: float,
+        face_cy: float,
+        face_w: float,
+        face_h: float,
+        frame_w: int,
+        frame_h: int,
+        center_position: tuple[float, float] | None = None,
+    ) -> tuple[int, int, int, int]:
+        target_ratio = self.target_aspect[0] / self.target_aspect[1]
+        frame_ratio = frame_w / frame_h
+        if target_ratio < frame_ratio:
+            crop_h = _make_even(frame_h)
+            crop_w = _make_even(int(crop_h * target_ratio))
+        else:
+            crop_w = _make_even(frame_w)
+            crop_h = _make_even(int(crop_w / target_ratio))
+        min_face_dim = max(face_w * frame_w, face_h * frame_h)
+        min_crop_dim = min_face_dim * (1 + 2 * self.padding)
+        if crop_w < min_crop_dim * target_ratio:
+            crop_w = _make_even(min(int(min_crop_dim * target_ratio), frame_w))
+            crop_h = _make_even(min(int(crop_w / target_ratio), frame_h))
+        if center_position is None:
+            center_position = self._apply_framing_offset(face_cx, face_cy, face_h)
+        center_x = center_position[0] * frame_w
+        center_y = center_position[1] * frame_h
+        x = int(center_x - crop_w / 2)
+        y = int(center_y - crop_h / 2)
+        x = max(0, min(x, frame_w - crop_w))
+        y = max(0, min(y, frame_h - crop_h))
+        return (x, y, crop_w, crop_h)
+    def apply(self, video: Video) -> Video:
+        tracker = FaceTracker(
+            selection_strategy=self.face_selection,
+            face_index=self.face_index,
+            smoothing=self.smoothing,
+            detection_interval=self.detection_interval,
+            backend=self.backend,
+            sample_rate=self.sample_rate,
+        )
+        h, w = video.frame_shape[:2]
+        target_ratio = self.target_aspect[0] / self.target_aspect[1]
+        if target_ratio < w / h:
+            out_h = _make_even(h)
+            out_w = _make_even(int(out_h * target_ratio))
+        else:
+            out_w = _make_even(w)
+            out_h = _make_even(int(out_w / target_ratio))
+        default_x = (w - out_w) // 2
+        default_y = (h - out_h) // 2
+        last_crop = (default_x, default_y, out_w, out_h)
+        current_position = (0.5, 0.5)
+        framing_label = self.framing_rule if self.framing_rule != "offset" else "legacy-offset"
+        logger.info(
+            "Face tracking crop: %dx%d -> %dx%d (%d:%d, framing=%s)",
+            w,
+            h,
+            out_w,
+            out_h,
+            self.target_aspect[0],
+            self.target_aspect[1],
+            framing_label,
+        )
+        new_frames = []
+        for i in tqdm(range(len(video.frames)), desc="Face tracking crop"):
+            frame = video.frames[i]
+            face_info = tracker.detect_and_track(frame, i)
+            if face_info:
+                cx, cy, fw, fh = face_info
+                target_position = self._apply_framing_offset(cx, cy, fh)
+                current_position = self._clamp_speed(current_position, target_position)
+                crop = self._calculate_crop_region(cx, cy, fw, fh, w, h, center_position=current_position)
+                last_crop = crop
+            else:
+                if self.fallback == "center":
+                    crop = (default_x, default_y, out_w, out_h)
+                elif self.fallback == "last_position":
+                    crop = last_crop
+                else:  # full_frame
+                    crop = (0, 0, w, h)
+            x, y, cw, ch = crop
+            cropped = frame[y : y + ch, x : x + cw]
+            if cropped.shape[1] != out_w or cropped.shape[0] != out_h:
+                cropped = cv2.resize(cropped, (out_w, out_h), interpolation=cv2.INTER_AREA)
+            new_frames.append(cropped)
+        video.frames = np.array(new_frames, dtype=np.uint8)
+        return video

{videopython-0.30.0 → videopython-0.31.0}/src/videopython/ai/understanding/faces.py RENAMED Viewed

@@ -1,8 +1,8 @@
 """Face detection and per-shot tracking for the understanding layer.
 Lifted from ``ai/transforms.py`` so analysis code (``VideoAnalyzer``) and
-transforms (``FaceTrackingCrop`` / ``SplitScreenComposite``) can share a
-single source. M6 lip-sync also consumes this directly.
+transforms (``FaceTrackingCrop``) can share a single source. M6 lip-sync
+also consumes this directly.
 Tracking is IoU-only — no embedding re-id. Tracks do not survive across
 shot/scene boundaries; a shot here means a ``SceneBoundary`` produced by
@@ -167,9 +167,8 @@ class FaceTracker:
     Two surfaces:
     - ``detect_and_track(frame, frame_index)`` / ``track_video(frames)`` —
-      legacy single-subject API used by ``FaceTrackingCrop`` /
-      ``SplitScreenComposite``. Returns a smoothed
-      ``(cx, cy, w, h)`` tuple.
+      legacy single-subject API used by ``FaceTrackingCrop``. Returns a
+      smoothed ``(cx, cy, w, h)`` tuple.
     - ``track_shot(frames, frame_indices)`` — new per-shot multi-track API
       returning ``list[FaceTrack]``. Used by the analysis pipeline (M5)
       and lip-sync (M6) to bind detections to subjects across the

videopython 0.30.0__tar.gz → 0.31.0__tar.gz

videopython 0.30.0tar.gz → 0.31.0tar.gz