polysync 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
polysync-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 王建硕 (Jian Shuo Wang)
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,115 @@
1
+ Metadata-Version: 2.4
2
+ Name: polysync
3
+ Version: 0.1.0
4
+ Summary: Multicam audio sync and director-style auto-edit — align N angles of one event by audio cross-correlation, then cut/PiP them into one MP4. Reversible sidecars, never re-encodes the originals.
5
+ Author: 王建硕 (Jian Shuo Wang)
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/jianshuo/polysync
8
+ Project-URL: Issues, https://github.com/jianshuo/polysync/issues
9
+ Keywords: multicam,audio-sync,video-editing,cross-correlation,ffmpeg,picture-in-picture,podcast,interview
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Environment :: Console
12
+ Classifier: Intended Audience :: End Users/Desktop
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Topic :: Multimedia :: Video
17
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
18
+ Requires-Python: >=3.9
19
+ Description-Content-Type: text/markdown
20
+ License-File: LICENSE
21
+ Requires-Dist: numpy>=1.21
22
+ Requires-Dist: scipy>=1.7
23
+ Provides-Extra: dev
24
+ Requires-Dist: pytest>=7; extra == "dev"
25
+ Dynamic: license-file
26
+
27
+ # polysync
28
+
29
+ **Multicam audio sync + director-style auto-edit.** Align N recordings of the
30
+ same event by audio cross-correlation, then cut or picture-in-picture them into
31
+ a single MP4 — driven entirely by who's talking.
32
+
33
+ What makes it different from "yet another sync tool":
34
+
35
+ - **Reversible sidecars, never re-encodes the originals.** Sync writes a tiny
36
+ `<input>.sync.json` next to each file holding a single offset. A 75-min 4K
37
+ 3-camera shoot is 250+ GB; baking offsets into re-encoded copies would double
38
+ that and lose quality. Downstream applies the offset with `ffmpeg -itsoffset`
39
+ at consume time. Originals are touched read-only, always.
40
+ - **Envelope cross-correlation, not raw waveform.** Matches the log-energy
41
+ envelope, which both mics hear regardless of their frequency response — robust
42
+ even when a second camera's on-board mic sounds nothing like the main one.
43
+ - **Clock-drift aware.** Cheap recorders drift 5–50 ppm; polysync fits the drift
44
+ across the recording and reports it separately, so long-form lip-sync can
45
+ correct it while camera-cut editing can ignore it.
46
+ - **Handles the messy real cases.** Auto-picks the loudest audio track (pro
47
+ cameras often leave track 1 dead), partial-coverage clips that only span part
48
+ of the session, and independent verification of the result.
49
+
50
+ ## Install
51
+
52
+ ```bash
53
+ pip install polysync # once published
54
+ # or, from a checkout:
55
+ pip install -e ".[dev]"
56
+ ```
57
+
58
+ Requires **Python ≥ 3.9** and **ffmpeg / ffprobe** on your `PATH`
59
+ (`brew install ffmpeg`, `apt install ffmpeg`, …). Python deps: `numpy`, `scipy`.
60
+
61
+ ## Quickstart
62
+
63
+ ```bash
64
+ # 1. Sync each angle to a reference camera (writes <file>.sync.json sidecars)
65
+ polysync sync CAM_A.mp4 CAM_B.mxf
66
+ polysync sync CAM_A.mp4 CAM_C.mxf
67
+
68
+ # 2. (optional) Verify the alignment — re-checks residual independently
69
+ polysync verify CAM_A.mp4 CAM_B.mxf CAM_B.mxf.sync.json
70
+
71
+ # 3. Build an auto-edit decision list (who's on screen each second)
72
+ polysync edit CAM_A.mp4 CAM_B.mxf CAM_C.mxf --out edl.json
73
+
74
+ # 4. Render — hard cuts, or with a picture-in-picture inset
75
+ polysync render-cuts edl.json --out final.mp4
76
+ polysync render-pip edl.json --out final.mp4 --pip bottom-right
77
+ ```
78
+
79
+ A clip that only covers **part** of the session (a Riverside / phone / lavalier
80
+ recording that started mid-way):
81
+
82
+ ```bash
83
+ polysync sync REFERENCE.mp4 PARTIAL.m4a --partial
84
+ ```
85
+
86
+ ## How it consumes the sidecar
87
+
88
+ `delta_seconds` is the source's `t=0` in the reference's timeline (positive =
89
+ source starts later). To align by hand:
90
+
91
+ ```bash
92
+ ffmpeg -itsoffset $(jq -r .delta_seconds CAM_B.mxf.sync.json) -i CAM_B.mxf \
93
+ -i CAM_A.mp4 -filter_complex "[0:v][1:v]hstack" out.mp4
94
+ ```
95
+
96
+ The `edit` / `render-*` commands read every sidecar automatically.
97
+
98
+ ## Python API
99
+
100
+ ```python
101
+ from polysync import compute_sync # pure-numpy core, unit-testable
102
+ from polysync.sync import sync_files # file → sidecar
103
+ from polysync.verify import verify_files
104
+ from polysync.edit import build_edl
105
+ ```
106
+
107
+ ## Status
108
+
109
+ Beta (0.1). Sync + verify are battle-tested on real Sony FX3/FX6 multicam
110
+ interview footage; the auto-edit is audio-energy-driven (no face detection).
111
+ Issues and PRs welcome.
112
+
113
+ ## License
114
+
115
+ MIT © 王建硕 (Jian Shuo Wang)
@@ -0,0 +1,89 @@
1
+ # polysync
2
+
3
+ **Multicam audio sync + director-style auto-edit.** Align N recordings of the
4
+ same event by audio cross-correlation, then cut or picture-in-picture them into
5
+ a single MP4 — driven entirely by who's talking.
6
+
7
+ What makes it different from "yet another sync tool":
8
+
9
+ - **Reversible sidecars, never re-encodes the originals.** Sync writes a tiny
10
+ `<input>.sync.json` next to each file holding a single offset. A 75-min 4K
11
+ 3-camera shoot is 250+ GB; baking offsets into re-encoded copies would double
12
+ that and lose quality. Downstream applies the offset with `ffmpeg -itsoffset`
13
+ at consume time. Originals are touched read-only, always.
14
+ - **Envelope cross-correlation, not raw waveform.** Matches the log-energy
15
+ envelope, which both mics hear regardless of their frequency response — robust
16
+ even when a second camera's on-board mic sounds nothing like the main one.
17
+ - **Clock-drift aware.** Cheap recorders drift 5–50 ppm; polysync fits the drift
18
+ across the recording and reports it separately, so long-form lip-sync can
19
+ correct it while camera-cut editing can ignore it.
20
+ - **Handles the messy real cases.** Auto-picks the loudest audio track (pro
21
+ cameras often leave track 1 dead), partial-coverage clips that only span part
22
+ of the session, and independent verification of the result.
23
+
24
+ ## Install
25
+
26
+ ```bash
27
+ pip install polysync # once published
28
+ # or, from a checkout:
29
+ pip install -e ".[dev]"
30
+ ```
31
+
32
+ Requires **Python ≥ 3.9** and **ffmpeg / ffprobe** on your `PATH`
33
+ (`brew install ffmpeg`, `apt install ffmpeg`, …). Python deps: `numpy`, `scipy`.
34
+
35
+ ## Quickstart
36
+
37
+ ```bash
38
+ # 1. Sync each angle to a reference camera (writes <file>.sync.json sidecars)
39
+ polysync sync CAM_A.mp4 CAM_B.mxf
40
+ polysync sync CAM_A.mp4 CAM_C.mxf
41
+
42
+ # 2. (optional) Verify the alignment — re-checks residual independently
43
+ polysync verify CAM_A.mp4 CAM_B.mxf CAM_B.mxf.sync.json
44
+
45
+ # 3. Build an auto-edit decision list (who's on screen each second)
46
+ polysync edit CAM_A.mp4 CAM_B.mxf CAM_C.mxf --out edl.json
47
+
48
+ # 4. Render — hard cuts, or with a picture-in-picture inset
49
+ polysync render-cuts edl.json --out final.mp4
50
+ polysync render-pip edl.json --out final.mp4 --pip bottom-right
51
+ ```
52
+
53
+ A clip that only covers **part** of the session (a Riverside / phone / lavalier
54
+ recording that started mid-way):
55
+
56
+ ```bash
57
+ polysync sync REFERENCE.mp4 PARTIAL.m4a --partial
58
+ ```
59
+
60
+ ## How it consumes the sidecar
61
+
62
+ `delta_seconds` is the source's `t=0` in the reference's timeline (positive =
63
+ source starts later). To align by hand:
64
+
65
+ ```bash
66
+ ffmpeg -itsoffset $(jq -r .delta_seconds CAM_B.mxf.sync.json) -i CAM_B.mxf \
67
+ -i CAM_A.mp4 -filter_complex "[0:v][1:v]hstack" out.mp4
68
+ ```
69
+
70
+ The `edit` / `render-*` commands read every sidecar automatically.
71
+
72
+ ## Python API
73
+
74
+ ```python
75
+ from polysync import compute_sync # pure-numpy core, unit-testable
76
+ from polysync.sync import sync_files # file → sidecar
77
+ from polysync.verify import verify_files
78
+ from polysync.edit import build_edl
79
+ ```
80
+
81
+ ## Status
82
+
83
+ Beta (0.1). Sync + verify are battle-tested on real Sony FX3/FX6 multicam
84
+ interview footage; the auto-edit is audio-energy-driven (no face detection).
85
+ Issues and PRs welcome.
86
+
87
+ ## License
88
+
89
+ MIT © 王建硕 (Jian Shuo Wang)
@@ -0,0 +1,43 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "polysync"
7
+ version = "0.1.0"
8
+ description = "Multicam audio sync and director-style auto-edit — align N angles of one event by audio cross-correlation, then cut/PiP them into one MP4. Reversible sidecars, never re-encodes the originals."
9
+ readme = "README.md"
10
+ requires-python = ">=3.9"
11
+ license = { text = "MIT" }
12
+ authors = [{ name = "王建硕 (Jian Shuo Wang)" }]
13
+ keywords = ["multicam", "audio-sync", "video-editing", "cross-correlation", "ffmpeg", "picture-in-picture", "podcast", "interview"]
14
+ classifiers = [
15
+ "Development Status :: 4 - Beta",
16
+ "Environment :: Console",
17
+ "Intended Audience :: End Users/Desktop",
18
+ "License :: OSI Approved :: MIT License",
19
+ "Programming Language :: Python :: 3",
20
+ "Programming Language :: Python :: 3.9",
21
+ "Topic :: Multimedia :: Video",
22
+ "Topic :: Multimedia :: Sound/Audio :: Analysis",
23
+ ]
24
+ dependencies = [
25
+ "numpy>=1.21",
26
+ "scipy>=1.7",
27
+ ]
28
+
29
+ [project.optional-dependencies]
30
+ dev = ["pytest>=7"]
31
+
32
+ [project.urls]
33
+ Homepage = "https://github.com/jianshuo/polysync"
34
+ Issues = "https://github.com/jianshuo/polysync/issues"
35
+
36
+ [project.scripts]
37
+ polysync = "polysync.cli:main"
38
+
39
+ [tool.setuptools.packages.find]
40
+ where = ["src"]
41
+
42
+ [tool.pytest.ini_options]
43
+ testpaths = ["tests"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,19 @@
1
+ """polysync — multicam audio sync + director-style auto-edit.
2
+
3
+ Align N recordings of one event by audio cross-correlation (envelope-based,
4
+ robust at low SNR), emit reversible `.sync.json` sidecars (originals are never
5
+ re-encoded), then auto-cut / picture-in-picture them into a single MP4.
6
+
7
+ Public API:
8
+ from polysync import compute_sync, SyncResult, SyncError
9
+ from polysync.sidecar import read_sidecar, write_sidecar
10
+ """
11
+ from .sync import compute_sync, SyncResult, SyncError
12
+ from .sidecar import read_sidecar, write_sidecar, sidecar_path, SCHEMA_VERSION
13
+
14
+ __version__ = "0.1.0"
15
+ __all__ = [
16
+ "compute_sync", "SyncResult", "SyncError",
17
+ "read_sidecar", "write_sidecar", "sidecar_path", "SCHEMA_VERSION",
18
+ "__version__",
19
+ ]
@@ -0,0 +1,130 @@
1
+ """Shared audio primitives — the pieces sync, verify, and edit all need.
2
+
3
+ Everything here is either pure numpy/scipy (unit-testable without media) or a
4
+ thin ffmpeg/ffprobe wrapper. Keeping these in one place is the whole reason
5
+ polysync is a package and not three copy-pasted scripts.
6
+ """
7
+ import subprocess
8
+ from pathlib import Path
9
+
10
+ import numpy as np
11
+ from scipy import signal
12
+
13
+
14
+ def loudest_audio_stream(video_path):
15
+ """Return the index N of the audio stream (`0:a:N`) with the highest mean
16
+ volume, probed over a 60 s window mid-file.
17
+
18
+ Why this matters: pro cameras often record multiple audio tracks where the
19
+ first one is dead. Sony FX6 MXF clips carry 4 mono PCM tracks and commonly
20
+ leave a:0 / a:1 silent (~-90 dB) with the real room mic on a:2 / a:3.
21
+ Hard-coding `0:a:0` would cross-correlate silence and fail to sync, so pick
22
+ the loudest track instead. Single-stream files (most MP4 cams) short-circuit
23
+ to a:0.
24
+ """
25
+ video_path = Path(video_path)
26
+ streams = subprocess.run(
27
+ ["ffprobe", "-v", "error", "-select_streams", "a",
28
+ "-show_entries", "stream=index", "-of", "csv=p=0", str(video_path)],
29
+ check=True, capture_output=True, text=True,
30
+ ).stdout.strip().splitlines()
31
+ if len(streams) <= 1:
32
+ return 0
33
+ best_idx, best_db = 0, -1e9
34
+ for ch in range(len(streams)):
35
+ err = subprocess.run(
36
+ ["ffmpeg", "-nostdin", "-hide_banner", "-ss", "300", "-t", "60",
37
+ "-i", str(video_path), "-map", "0:a:%d" % ch,
38
+ "-af", "volumedetect", "-f", "null", "-"],
39
+ capture_output=True, text=True,
40
+ ).stderr
41
+ for line in err.splitlines():
42
+ if "mean_volume" in line:
43
+ try:
44
+ db = float(line.split("mean_volume:")[1].strip().split()[0])
45
+ except (IndexError, ValueError):
46
+ db = -1e9
47
+ if db > best_db:
48
+ best_db, best_idx = db, ch
49
+ break
50
+ print(" [%s] loudest audio stream: a:%d (%.1f dB)"
51
+ % (video_path.name, best_idx, best_db))
52
+ return best_idx
53
+
54
+
55
+ def extract_pcm(video_path, dst, sr, stream=None):
56
+ """Extract one audio track as mono signed-16 PCM at `sr` Hz.
57
+
58
+ `stream` is the `0:a:N` index; if None, auto-select the loudest track.
59
+ No `-itsoffset` is ever applied here — offsets are pure metadata and are
60
+ handled by index arithmetic / `-itsoffset` at consume time downstream.
61
+ """
62
+ video_path = Path(video_path)
63
+ ch = loudest_audio_stream(video_path) if stream is None else stream
64
+ subprocess.run(
65
+ ["ffmpeg", "-nostdin", "-y", "-i", str(video_path),
66
+ "-map", "0:a:%d" % ch, "-ac", "1", "-ar", str(sr),
67
+ "-f", "s16le", str(dst)],
68
+ check=True, stderr=subprocess.DEVNULL,
69
+ )
70
+
71
+
72
+ def read_pcm(path):
73
+ """Read a raw s16le file into a float32 array."""
74
+ return np.fromfile(str(path), dtype=np.int16).astype(np.float32)
75
+
76
+
77
+ def media_duration(path):
78
+ """Container duration in seconds, via ffprobe."""
79
+ out = subprocess.run(
80
+ ["ffprobe", "-v", "error", "-show_entries", "format=duration",
81
+ "-of", "default=nw=1:nk=1", str(path)],
82
+ check=True, capture_output=True, text=True,
83
+ )
84
+ return float(out.stdout.strip())
85
+
86
+
87
+ def frame_rms(x, sr, hop_ms=10, win_ms=50):
88
+ """Sliding-window RMS of `x`. Returns (rms_per_frame, frame_sr_hz).
89
+
90
+ Uses a cumulative-sum trick so it's O(n) regardless of window size. This is
91
+ the shared primitive behind both the sync envelope (log of this, high-passed)
92
+ and the edit per-second loudness.
93
+ """
94
+ hop = int(sr * hop_ms / 1000)
95
+ win = int(sr * win_ms / 1000)
96
+ n = (len(x) - win) // hop + 1
97
+ if n <= 0:
98
+ return np.zeros(0, dtype=np.float32), sr / hop
99
+ sq = x.astype(np.float64) ** 2
100
+ csq = np.concatenate([[0.0], np.cumsum(sq)])
101
+ out = np.empty(n, dtype=np.float32)
102
+ for i in range(n):
103
+ s = i * hop
104
+ out[i] = np.sqrt(max(1e-9, (csq[s + win] - csq[s]) / win))
105
+ return out, sr / hop
106
+
107
+
108
+ def log_envelope(x, sr, hop_ms=10, win_ms=50, highpass_hz=0.05):
109
+ """Log-energy envelope, high-passed to strip slow gain/drift offsets.
110
+
111
+ This is what sync cross-correlates: it captures dialogue/music dynamics
112
+ that BOTH mics hear regardless of their frequency response — the reason
113
+ the matcher is robust even when the two cameras have very different mics.
114
+ """
115
+ rms, fsr = frame_rms(x, sr, hop_ms, win_ms)
116
+ env = np.log(rms + 1e-3)
117
+ if highpass_hz:
118
+ env = highpass(env, fsr, highpass_hz)
119
+ return env, fsr
120
+
121
+
122
+ def highpass(x, fs, cut_hz=0.05):
123
+ sos = signal.butter(2, cut_hz, btype="high", fs=fs, output="sos")
124
+ return signal.sosfiltfilt(sos, x).astype(np.float32)
125
+
126
+
127
+ def normalize(x):
128
+ x = x - x.mean()
129
+ s = x.std()
130
+ return x / s if s > 0 else x
@@ -0,0 +1,79 @@
1
+ """`polysync` command-line entry point.
2
+
3
+ polysync sync REFERENCE SOURCE [--partial]
4
+ polysync verify REFERENCE SOURCE SIDECAR [--apply-drift]
5
+ polysync edit IN1 IN2 ... --out edl.json [--mode rotation|greedy]
6
+ polysync render-cuts EDL --out out.mp4
7
+ polysync render-pip EDL --out out.mp4 [--pip bottom-right]
8
+ """
9
+ import argparse
10
+ import sys
11
+
12
+ from . import __version__
13
+ from .sync import sync_files, SyncError
14
+ from .verify import verify_files
15
+ from .edit import autoedit, render_cuts, render_pip
16
+
17
+ USAGE = __doc__
18
+
19
+
20
+ def _cmd_sync(argv):
21
+ ap = argparse.ArgumentParser(prog="polysync sync")
22
+ ap.add_argument("reference", help="Reference recording (defines the timeline)")
23
+ ap.add_argument("source", help="Source to align to the reference")
24
+ ap.add_argument("--partial", action="store_true",
25
+ help="Lenient mode for a source covering only part of the "
26
+ "reference's span; degrades gracefully, writes only the "
27
+ "source sidecar.")
28
+ args = ap.parse_args(argv)
29
+ try:
30
+ sync_files(args.reference, args.source, partial=args.partial)
31
+ except SyncError as e:
32
+ print("ERROR: %s" % e, file=sys.stderr)
33
+ return 1
34
+ return 0
35
+
36
+
37
+ def _cmd_verify(argv):
38
+ ap = argparse.ArgumentParser(prog="polysync verify")
39
+ ap.add_argument("reference")
40
+ ap.add_argument("source")
41
+ ap.add_argument("sidecar", help="The source's <source>.sync.json")
42
+ ap.add_argument("--apply-drift", action="store_true")
43
+ ap.add_argument("--step", type=float, default=600.0,
44
+ help="Probe spacing in seconds (default 10 min)")
45
+ args = ap.parse_args(argv)
46
+ try:
47
+ passed, _ = verify_files(args.reference, args.source, args.sidecar,
48
+ step=args.step, apply_drift=args.apply_drift)
49
+ except ValueError as e:
50
+ print("ERROR: %s" % e, file=sys.stderr)
51
+ return 2
52
+ return 0 if passed else 1
53
+
54
+
55
+ def main(argv=None):
56
+ argv = list(sys.argv[1:] if argv is None else argv)
57
+ if not argv or argv[0] in ("-h", "--help", "help"):
58
+ print(USAGE)
59
+ return 0
60
+ if argv[0] in ("-V", "--version"):
61
+ print("polysync %s" % __version__)
62
+ return 0
63
+
64
+ cmd, rest = argv[0], argv[1:]
65
+ dispatch = {
66
+ "sync": _cmd_sync,
67
+ "verify": _cmd_verify,
68
+ "edit": lambda a: autoedit.main(a) or 0,
69
+ "render-cuts": lambda a: render_cuts.main(a) or 0,
70
+ "render-pip": lambda a: render_pip.main(a) or 0,
71
+ }
72
+ if cmd not in dispatch:
73
+ print("Unknown command %r.\n%s" % (cmd, USAGE), file=sys.stderr)
74
+ return 2
75
+ return dispatch[cmd](rest)
76
+
77
+
78
+ if __name__ == "__main__":
79
+ sys.exit(main())
@@ -0,0 +1,9 @@
1
+ """Director-style multicam auto-edit on top of polysync sidecars.
2
+
3
+ autoedit — build an EDL (which cam is on screen each second) from synced inputs
4
+ render_cuts — render the EDL to one MP4 (hard cuts)
5
+ render_pip — render the EDL with a picture-in-picture inset
6
+ """
7
+ from .autoedit import build_edl
8
+
9
+ __all__ = ["build_edl"]