talks-reducer 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- talks_reducer-0.1.0/LICENSE +23 -0
- talks_reducer-0.1.0/PKG-INFO +102 -0
- talks_reducer-0.1.0/README.md +64 -0
- talks_reducer-0.1.0/pyproject.toml +33 -0
- talks_reducer-0.1.0/setup.cfg +4 -0
- talks_reducer-0.1.0/talks_reducer/__init__.py +7 -0
- talks_reducer-0.1.0/talks_reducer/__main__.py +8 -0
- talks_reducer-0.1.0/talks_reducer/audio.py +109 -0
- talks_reducer-0.1.0/talks_reducer/chunks.py +92 -0
- talks_reducer-0.1.0/talks_reducer/cli.py +129 -0
- talks_reducer-0.1.0/talks_reducer/ffmpeg.py +269 -0
- talks_reducer-0.1.0/talks_reducer/pipeline.py +244 -0
- talks_reducer-0.1.0/talks_reducer.egg-info/PKG-INFO +102 -0
- talks_reducer-0.1.0/talks_reducer.egg-info/SOURCES.txt +16 -0
- talks_reducer-0.1.0/talks_reducer.egg-info/dependency_links.txt +1 -0
- talks_reducer-0.1.0/talks_reducer.egg-info/entry_points.txt +2 -0
- talks_reducer-0.1.0/talks_reducer.egg-info/requires.txt +4 -0
- talks_reducer-0.1.0/talks_reducer.egg-info/top_level.txt +1 -0
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2019 carykh
|
|
4
|
+
Copyright (c) 2020 gegell
|
|
5
|
+
Copyright (c) 2025 Stanislav Popov
|
|
6
|
+
|
|
7
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
8
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
9
|
+
in the Software without restriction, including without limitation the rights
|
|
10
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
11
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
12
|
+
furnished to do so, subject to the following conditions:
|
|
13
|
+
|
|
14
|
+
The above copyright notice and this permission notice shall be included in all
|
|
15
|
+
copies or substantial portions of the Software.
|
|
16
|
+
|
|
17
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
18
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
19
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
20
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
21
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
22
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
23
|
+
SOFTWARE.
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: talks-reducer
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: CLI for speeding up long-form talks by removing silence
|
|
5
|
+
Author: Talks Reducer Maintainers
|
|
6
|
+
License: MIT License
|
|
7
|
+
|
|
8
|
+
Copyright (c) 2019 carykh
|
|
9
|
+
Copyright (c) 2020 gegell
|
|
10
|
+
Copyright (c) 2025 Stanislav Popov
|
|
11
|
+
|
|
12
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
13
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
14
|
+
in the Software without restriction, including without limitation the rights
|
|
15
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
16
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
17
|
+
furnished to do so, subject to the following conditions:
|
|
18
|
+
|
|
19
|
+
The above copyright notice and this permission notice shall be included in all
|
|
20
|
+
copies or substantial portions of the Software.
|
|
21
|
+
|
|
22
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
23
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
24
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
25
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
26
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
27
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
28
|
+
SOFTWARE.
|
|
29
|
+
|
|
30
|
+
Requires-Python: >=3.9
|
|
31
|
+
Description-Content-Type: text/markdown
|
|
32
|
+
License-File: LICENSE
|
|
33
|
+
Requires-Dist: audiotsm>=0.1.2
|
|
34
|
+
Requires-Dist: scipy>=1.10.0
|
|
35
|
+
Requires-Dist: numpy<2.0.0,>=1.22.0
|
|
36
|
+
Requires-Dist: tqdm>=4.65.0
|
|
37
|
+
Dynamic: license-file
|
|
38
|
+
|
|
39
|
+
# Talks Reducer
|
|
40
|
+
Talks Reducer shortens long-form presentations by removing silent gaps and optionally re-encoding them to smaller files. The
|
|
41
|
+
project was renamed from **jumpcutter** to emphasize its focus on conference talks and lectures.
|
|
42
|
+
|
|
43
|
+
When CUDA-capable hardware is available the pipeline leans on GPU encoders to keep export times low, but it still runs great on
|
|
44
|
+
CPUs.
|
|
45
|
+
|
|
46
|
+
## Repository Structure
|
|
47
|
+
- `talks_reducer/` — Python package that exposes the CLI and reusable pipeline:
|
|
48
|
+
- `cli.py` parses arguments and dispatches to the pipeline.
|
|
49
|
+
- `pipeline.py` orchestrates FFmpeg, audio processing, and temporary assets.
|
|
50
|
+
- `audio.py` handles audio validation, volume analysis, and phase vocoder processing.
|
|
51
|
+
- `chunks.py` builds timing metadata and FFmpeg expressions for frame selection.
|
|
52
|
+
- `ffmpeg.py` discovers the FFmpeg binary, checks CUDA availability, and assembles command strings.
|
|
53
|
+
- `requirements.txt` — Python dependencies for local development.
|
|
54
|
+
- `default.nix` — reproducible environment definition for Nix users.
|
|
55
|
+
- `CONTRIBUTION.md` — development workflow, formatting expectations, and release checklist.
|
|
56
|
+
- `AGENTS.md` — maintainer tips and coding conventions for this repository.
|
|
57
|
+
|
|
58
|
+
## Example
|
|
59
|
+
- 1h 37m, 571 MB — Original OBS video
|
|
60
|
+
- 1h 19m, 751 MB — Talks Reducer
|
|
61
|
+
- 1h 19m, 171 MB — Talks Reducer `--small`
|
|
62
|
+
|
|
63
|
+
The `--small` preset applies a 720p video scale and 128 kbps audio bitrate, making it useful for sharing talks over constrained
|
|
64
|
+
connections. Without `--small`, the script aims to preserve original quality while removing silence.
|
|
65
|
+
|
|
66
|
+
## Highlights
|
|
67
|
+
- Builds on gegell's classic jumpcutter workflow with more efficient frame and audio processing
|
|
68
|
+
- Generates FFmpeg filter graphs instead of writing temporary frames to disk
|
|
69
|
+
- Streams audio transformations in memory to avoid slow intermediate files
|
|
70
|
+
- Accepts multiple inputs or directories of recordings in a single run
|
|
71
|
+
- Provides progress feedback via `tqdm`
|
|
72
|
+
- Automatically detects NVENC availability, so you no longer need to pass `--cuda`
|
|
73
|
+
|
|
74
|
+
## Processing Pipeline
|
|
75
|
+
1. Validate that each input file contains an audio stream using `ffprobe`.
|
|
76
|
+
2. Extract audio and calculate loudness to identify silent regions.
|
|
77
|
+
3. Stretch the non-silent segments with `audiotsm` to maintain speech clarity.
|
|
78
|
+
4. Stitch the processed audio and video together with FFmpeg, using NVENC if the GPU encoders are detected.
|
|
79
|
+
|
|
80
|
+
## Recent Updates
|
|
81
|
+
- **October 2025** — Project renamed to *Talks Reducer* across documentation and scripts.
|
|
82
|
+
- **October 2025** — Added `--small` preset with 720p/128 kbps defaults for bandwidth-friendly exports.
|
|
83
|
+
- **October 2025** — Removed the `--cuda` flag; CUDA/NVENC support is now auto-detected.
|
|
84
|
+
- **October 2025** — Improved `--small` encoder arguments to balance size and clarity.
|
|
85
|
+
- **October 2025** — CLI argument parsing fixes to prevent crashes on invalid combinations.
|
|
86
|
+
- **October 2025** — Added example output comparison to the README.
|
|
87
|
+
|
|
88
|
+
## Quick Start
|
|
89
|
+
1. Install FFmpeg and ensure it is on your `PATH`
|
|
90
|
+
2. Install Talks Reducer with `pip install talks-reducer` (this exposes the `talks-reducer` command)
|
|
91
|
+
3. Inspect available options with `talks-reducer --help`
|
|
92
|
+
4. Process a recording using `talks-reducer /path/to/video`
|
|
93
|
+
|
|
94
|
+
## Requirements
|
|
95
|
+
- Python 3 with `numpy`, `scipy`, `audiotsm`, and `tqdm`
|
|
96
|
+
- FFmpeg with optional NVIDIA NVENC support for CUDA acceleration
|
|
97
|
+
|
|
98
|
+
## Contributing
|
|
99
|
+
See `CONTRIBUTION.md` for development setup details and guidance on sharing improvements.
|
|
100
|
+
|
|
101
|
+
## License
|
|
102
|
+
Talks Reducer is released under the MIT License. See `LICENSE` for the full text.
|
|
@@ -0,0 +1,64 @@
|
|
|
1
|
+
# Talks Reducer
|
|
2
|
+
Talks Reducer shortens long-form presentations by removing silent gaps and optionally re-encoding them to smaller files. The
|
|
3
|
+
project was renamed from **jumpcutter** to emphasize its focus on conference talks and lectures.
|
|
4
|
+
|
|
5
|
+
When CUDA-capable hardware is available the pipeline leans on GPU encoders to keep export times low, but it still runs great on
|
|
6
|
+
CPUs.
|
|
7
|
+
|
|
8
|
+
## Repository Structure
|
|
9
|
+
- `talks_reducer/` — Python package that exposes the CLI and reusable pipeline:
|
|
10
|
+
- `cli.py` parses arguments and dispatches to the pipeline.
|
|
11
|
+
- `pipeline.py` orchestrates FFmpeg, audio processing, and temporary assets.
|
|
12
|
+
- `audio.py` handles audio validation, volume analysis, and phase vocoder processing.
|
|
13
|
+
- `chunks.py` builds timing metadata and FFmpeg expressions for frame selection.
|
|
14
|
+
- `ffmpeg.py` discovers the FFmpeg binary, checks CUDA availability, and assembles command strings.
|
|
15
|
+
- `requirements.txt` — Python dependencies for local development.
|
|
16
|
+
- `default.nix` — reproducible environment definition for Nix users.
|
|
17
|
+
- `CONTRIBUTION.md` — development workflow, formatting expectations, and release checklist.
|
|
18
|
+
- `AGENTS.md` — maintainer tips and coding conventions for this repository.
|
|
19
|
+
|
|
20
|
+
## Example
|
|
21
|
+
- 1h 37m, 571 MB — Original OBS video
|
|
22
|
+
- 1h 19m, 751 MB — Talks Reducer
|
|
23
|
+
- 1h 19m, 171 MB — Talks Reducer `--small`
|
|
24
|
+
|
|
25
|
+
The `--small` preset applies a 720p video scale and 128 kbps audio bitrate, making it useful for sharing talks over constrained
|
|
26
|
+
connections. Without `--small`, the script aims to preserve original quality while removing silence.
|
|
27
|
+
|
|
28
|
+
## Highlights
|
|
29
|
+
- Builds on gegell's classic jumpcutter workflow with more efficient frame and audio processing
|
|
30
|
+
- Generates FFmpeg filter graphs instead of writing temporary frames to disk
|
|
31
|
+
- Streams audio transformations in memory to avoid slow intermediate files
|
|
32
|
+
- Accepts multiple inputs or directories of recordings in a single run
|
|
33
|
+
- Provides progress feedback via `tqdm`
|
|
34
|
+
- Automatically detects NVENC availability, so you no longer need to pass `--cuda`
|
|
35
|
+
|
|
36
|
+
## Processing Pipeline
|
|
37
|
+
1. Validate that each input file contains an audio stream using `ffprobe`.
|
|
38
|
+
2. Extract audio and calculate loudness to identify silent regions.
|
|
39
|
+
3. Stretch the non-silent segments with `audiotsm` to maintain speech clarity.
|
|
40
|
+
4. Stitch the processed audio and video together with FFmpeg, using NVENC if the GPU encoders are detected.
|
|
41
|
+
|
|
42
|
+
## Recent Updates
|
|
43
|
+
- **October 2025** — Project renamed to *Talks Reducer* across documentation and scripts.
|
|
44
|
+
- **October 2025** — Added `--small` preset with 720p/128 kbps defaults for bandwidth-friendly exports.
|
|
45
|
+
- **October 2025** — Removed the `--cuda` flag; CUDA/NVENC support is now auto-detected.
|
|
46
|
+
- **October 2025** — Improved `--small` encoder arguments to balance size and clarity.
|
|
47
|
+
- **October 2025** — CLI argument parsing fixes to prevent crashes on invalid combinations.
|
|
48
|
+
- **October 2025** — Added example output comparison to the README.
|
|
49
|
+
|
|
50
|
+
## Quick Start
|
|
51
|
+
1. Install FFmpeg and ensure it is on your `PATH`
|
|
52
|
+
2. Install Talks Reducer with `pip install talks-reducer` (this exposes the `talks-reducer` command)
|
|
53
|
+
3. Inspect available options with `talks-reducer --help`
|
|
54
|
+
4. Process a recording using `talks-reducer /path/to/video`
|
|
55
|
+
|
|
56
|
+
## Requirements
|
|
57
|
+
- Python 3 with `numpy`, `scipy`, `audiotsm`, and `tqdm`
|
|
58
|
+
- FFmpeg with optional NVIDIA NVENC support for CUDA acceleration
|
|
59
|
+
|
|
60
|
+
## Contributing
|
|
61
|
+
See `CONTRIBUTION.md` for development setup details and guidance on sharing improvements.
|
|
62
|
+
|
|
63
|
+
## License
|
|
64
|
+
Talks Reducer is released under the MIT License. See `LICENSE` for the full text.
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["setuptools>=64", "wheel"]
|
|
3
|
+
build-backend = "setuptools.build_meta"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "talks-reducer"
|
|
7
|
+
version = "0.1.0"
|
|
8
|
+
description = "CLI for speeding up long-form talks by removing silence"
|
|
9
|
+
readme = "README.md"
|
|
10
|
+
requires-python = ">=3.9"
|
|
11
|
+
license = { file = "LICENSE" }
|
|
12
|
+
authors = [
|
|
13
|
+
{ name = "Talks Reducer Maintainers" }
|
|
14
|
+
]
|
|
15
|
+
dependencies = [
|
|
16
|
+
"audiotsm>=0.1.2",
|
|
17
|
+
"scipy>=1.10.0",
|
|
18
|
+
"numpy>=1.22.0,<2.0.0",
|
|
19
|
+
"tqdm>=4.65.0",
|
|
20
|
+
]
|
|
21
|
+
|
|
22
|
+
[project.scripts]
|
|
23
|
+
talks-reducer = "talks_reducer.cli:main"
|
|
24
|
+
|
|
25
|
+
[tool.black]
|
|
26
|
+
line-length = 88
|
|
27
|
+
target-version = ["py39"]
|
|
28
|
+
|
|
29
|
+
[tool.isort]
|
|
30
|
+
profile = "black"
|
|
31
|
+
line_length = 88
|
|
32
|
+
known_first_party = ["talks_reducer"]
|
|
33
|
+
|
|
@@ -0,0 +1,109 @@
|
|
|
1
|
+
"""Audio processing helpers for the talks reducer pipeline."""
|
|
2
|
+
|
|
3
|
+
from __future__ import annotations
|
|
4
|
+
|
|
5
|
+
import math
|
|
6
|
+
import subprocess
|
|
7
|
+
from typing import List, Sequence, Tuple
|
|
8
|
+
|
|
9
|
+
import numpy as np
|
|
10
|
+
from audiotsm import phasevocoder
|
|
11
|
+
from audiotsm.io.array import ArrayReader, ArrayWriter
|
|
12
|
+
|
|
13
|
+
|
|
14
|
+
def get_max_volume(samples: np.ndarray) -> float:
|
|
15
|
+
"""Return the maximum absolute volume in the provided sample array."""
|
|
16
|
+
|
|
17
|
+
return float(max(-np.min(samples), np.max(samples)))
|
|
18
|
+
|
|
19
|
+
|
|
20
|
+
def is_valid_input_file(filename: str) -> bool:
|
|
21
|
+
"""Check whether ``ffprobe`` recognises the input file and finds an audio stream."""
|
|
22
|
+
|
|
23
|
+
command = (
|
|
24
|
+
'ffprobe -i "{}" -hide_banner -loglevel error -select_streams a'
|
|
25
|
+
" -show_entries stream=codec_type".format(filename)
|
|
26
|
+
)
|
|
27
|
+
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
|
28
|
+
outs, errs = None, None
|
|
29
|
+
try:
|
|
30
|
+
outs, errs = process.communicate(timeout=1)
|
|
31
|
+
except subprocess.TimeoutExpired:
|
|
32
|
+
print("Timeout while checking the input file. Aborting. Command:")
|
|
33
|
+
print(command)
|
|
34
|
+
process.kill()
|
|
35
|
+
outs, errs = process.communicate()
|
|
36
|
+
finally:
|
|
37
|
+
return len(errs) == 0 and len(outs) > 0
|
|
38
|
+
|
|
39
|
+
|
|
40
|
+
def process_audio_chunks(
|
|
41
|
+
audio_data: np.ndarray,
|
|
42
|
+
chunks: Sequence[Sequence[int]],
|
|
43
|
+
samples_per_frame: float,
|
|
44
|
+
speeds: Sequence[float],
|
|
45
|
+
audio_fade_envelope_size: int,
|
|
46
|
+
max_audio_volume: float,
|
|
47
|
+
*,
|
|
48
|
+
batch_size: int = 10,
|
|
49
|
+
) -> Tuple[np.ndarray, List[List[int]]]:
|
|
50
|
+
"""Return processed audio and updated chunk timings for the provided chunk list."""
|
|
51
|
+
|
|
52
|
+
audio_buffers: List[np.ndarray] = []
|
|
53
|
+
output_pointer = 0
|
|
54
|
+
updated_chunks: List[List[int]] = [list(chunk) for chunk in chunks]
|
|
55
|
+
normaliser = max(max_audio_volume, 1e-9)
|
|
56
|
+
|
|
57
|
+
for batch_start in range(0, len(chunks), batch_size):
|
|
58
|
+
batch_chunks = chunks[batch_start : batch_start + batch_size]
|
|
59
|
+
batch_audio: List[np.ndarray] = []
|
|
60
|
+
|
|
61
|
+
for chunk in batch_chunks:
|
|
62
|
+
start = int(chunk[0] * samples_per_frame)
|
|
63
|
+
end = int(chunk[1] * samples_per_frame)
|
|
64
|
+
audio_chunk = audio_data[start:end]
|
|
65
|
+
|
|
66
|
+
if audio_chunk.size == 0:
|
|
67
|
+
channels = audio_data.shape[1] if audio_data.ndim > 1 else 1
|
|
68
|
+
batch_audio.append(np.zeros((0, channels)))
|
|
69
|
+
continue
|
|
70
|
+
|
|
71
|
+
reader = ArrayReader(np.transpose(audio_chunk))
|
|
72
|
+
writer = ArrayWriter(reader.channels)
|
|
73
|
+
tsm = phasevocoder(reader.channels, speed=speeds[int(chunk[2])])
|
|
74
|
+
tsm.run(reader, writer)
|
|
75
|
+
altered_audio_data = np.transpose(writer.data)
|
|
76
|
+
|
|
77
|
+
if altered_audio_data.shape[0] < audio_fade_envelope_size:
|
|
78
|
+
altered_audio_data[:] = 0
|
|
79
|
+
else:
|
|
80
|
+
premask = np.arange(audio_fade_envelope_size) / audio_fade_envelope_size
|
|
81
|
+
mask = np.repeat(
|
|
82
|
+
premask[:, np.newaxis], altered_audio_data.shape[1], axis=1
|
|
83
|
+
)
|
|
84
|
+
altered_audio_data[:audio_fade_envelope_size] *= mask
|
|
85
|
+
altered_audio_data[-audio_fade_envelope_size:] *= 1 - mask
|
|
86
|
+
|
|
87
|
+
batch_audio.append(altered_audio_data / normaliser)
|
|
88
|
+
|
|
89
|
+
for index, chunk in enumerate(batch_chunks):
|
|
90
|
+
altered_audio_data = batch_audio[index]
|
|
91
|
+
audio_buffers.append(altered_audio_data)
|
|
92
|
+
|
|
93
|
+
end_pointer = output_pointer + altered_audio_data.shape[0]
|
|
94
|
+
start_output_frame = int(math.ceil(output_pointer / samples_per_frame))
|
|
95
|
+
end_output_frame = int(math.ceil(end_pointer / samples_per_frame))
|
|
96
|
+
|
|
97
|
+
updated_chunks[batch_start + index] = list(chunk[:2]) + [
|
|
98
|
+
start_output_frame,
|
|
99
|
+
end_output_frame,
|
|
100
|
+
]
|
|
101
|
+
output_pointer = end_pointer
|
|
102
|
+
|
|
103
|
+
if audio_buffers:
|
|
104
|
+
output_audio_data = np.concatenate(audio_buffers)
|
|
105
|
+
else:
|
|
106
|
+
channels = audio_data.shape[1] if audio_data.ndim > 1 else 1
|
|
107
|
+
output_audio_data = np.zeros((0, channels))
|
|
108
|
+
|
|
109
|
+
return output_audio_data, updated_chunks
|
|
@@ -0,0 +1,92 @@
|
|
|
1
|
+
"""Chunk creation utilities used by the talks reducer pipeline."""
|
|
2
|
+
|
|
3
|
+
from __future__ import annotations
|
|
4
|
+
|
|
5
|
+
from typing import List, Sequence, Tuple
|
|
6
|
+
|
|
7
|
+
import numpy as np
|
|
8
|
+
|
|
9
|
+
from .audio import get_max_volume
|
|
10
|
+
|
|
11
|
+
|
|
12
|
+
def detect_loud_frames(
|
|
13
|
+
audio_data: np.ndarray,
|
|
14
|
+
audio_frame_count: int,
|
|
15
|
+
samples_per_frame: float,
|
|
16
|
+
max_audio_volume: float,
|
|
17
|
+
silent_threshold: float,
|
|
18
|
+
) -> np.ndarray:
|
|
19
|
+
"""Return a boolean array indicating which frames contain loud audio."""
|
|
20
|
+
|
|
21
|
+
normaliser = max(max_audio_volume, 1e-9)
|
|
22
|
+
has_loud_audio = np.zeros(audio_frame_count, dtype=bool)
|
|
23
|
+
|
|
24
|
+
for frame_index in range(audio_frame_count):
|
|
25
|
+
start = int(frame_index * samples_per_frame)
|
|
26
|
+
end = min(int((frame_index + 1) * samples_per_frame), audio_data.shape[0])
|
|
27
|
+
audio_chunk = audio_data[start:end]
|
|
28
|
+
chunk_max_volume = float(get_max_volume(audio_chunk)) / normaliser
|
|
29
|
+
if chunk_max_volume >= silent_threshold:
|
|
30
|
+
has_loud_audio[frame_index] = True
|
|
31
|
+
|
|
32
|
+
return has_loud_audio
|
|
33
|
+
|
|
34
|
+
|
|
35
|
+
def build_chunks(
|
|
36
|
+
has_loud_audio: np.ndarray, frame_spreadage: int
|
|
37
|
+
) -> Tuple[List[List[int]], np.ndarray]:
|
|
38
|
+
"""Return chunks describing which frame ranges should be retained."""
|
|
39
|
+
|
|
40
|
+
audio_frame_count = len(has_loud_audio)
|
|
41
|
+
chunks: List[List[int]] = [[0, 0, 0]]
|
|
42
|
+
should_include_frame = np.zeros(audio_frame_count, dtype=bool)
|
|
43
|
+
|
|
44
|
+
for frame_index in range(audio_frame_count):
|
|
45
|
+
start = int(max(0, frame_index - frame_spreadage))
|
|
46
|
+
end = int(min(audio_frame_count, frame_index + 1 + frame_spreadage))
|
|
47
|
+
should_include_frame[frame_index] = np.any(has_loud_audio[start:end])
|
|
48
|
+
if (
|
|
49
|
+
frame_index >= 1
|
|
50
|
+
and should_include_frame[frame_index]
|
|
51
|
+
!= should_include_frame[frame_index - 1]
|
|
52
|
+
):
|
|
53
|
+
chunks.append(
|
|
54
|
+
[chunks[-1][1], frame_index, int(should_include_frame[frame_index - 1])]
|
|
55
|
+
)
|
|
56
|
+
|
|
57
|
+
chunks.append(
|
|
58
|
+
[
|
|
59
|
+
chunks[-1][1],
|
|
60
|
+
audio_frame_count,
|
|
61
|
+
int(should_include_frame[audio_frame_count - 1]),
|
|
62
|
+
]
|
|
63
|
+
)
|
|
64
|
+
return chunks[1:], should_include_frame
|
|
65
|
+
|
|
66
|
+
|
|
67
|
+
def get_tree_expression(chunks: Sequence[Sequence[int]]) -> str:
|
|
68
|
+
"""Return the FFmpeg expression needed to map chunk timing updates."""
|
|
69
|
+
|
|
70
|
+
return "{}/TB/FR".format(_get_tree_expression_rec(chunks))
|
|
71
|
+
|
|
72
|
+
|
|
73
|
+
def _get_tree_expression_rec(chunks: Sequence[Sequence[int]]) -> str:
|
|
74
|
+
if len(chunks) > 1:
|
|
75
|
+
split_index = int(len(chunks) / 2)
|
|
76
|
+
center = chunks[split_index]
|
|
77
|
+
return "if(lt(N,{}),{},{})".format(
|
|
78
|
+
center[0],
|
|
79
|
+
_get_tree_expression_rec(chunks[:split_index]),
|
|
80
|
+
_get_tree_expression_rec(chunks[split_index:]),
|
|
81
|
+
)
|
|
82
|
+
chunk = chunks[0]
|
|
83
|
+
local_speedup = (chunk[3] - chunk[2]) / (chunk[1] - chunk[0])
|
|
84
|
+
offset = -chunk[0] * local_speedup + chunk[2]
|
|
85
|
+
return "N*{}{:+}".format(local_speedup, offset)
|
|
86
|
+
|
|
87
|
+
|
|
88
|
+
__all__ = [
|
|
89
|
+
"detect_loud_frames",
|
|
90
|
+
"build_chunks",
|
|
91
|
+
"get_tree_expression",
|
|
92
|
+
]
|
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
"""Command line interface for the talks reducer package."""
|
|
2
|
+
|
|
3
|
+
from __future__ import annotations
|
|
4
|
+
|
|
5
|
+
import argparse
|
|
6
|
+
import os
|
|
7
|
+
import time
|
|
8
|
+
from typing import Dict, List
|
|
9
|
+
|
|
10
|
+
from . import audio
|
|
11
|
+
from .pipeline import speed_up_video
|
|
12
|
+
|
|
13
|
+
|
|
14
|
+
def _build_parser() -> argparse.ArgumentParser:
|
|
15
|
+
"""Create the argument parser used by the command line interface."""
|
|
16
|
+
|
|
17
|
+
parser = argparse.ArgumentParser(
|
|
18
|
+
description="Modifies a video file to play at different speeds when there is sound vs. silence.",
|
|
19
|
+
)
|
|
20
|
+
parser.add_argument(
|
|
21
|
+
"input_file",
|
|
22
|
+
type=str,
|
|
23
|
+
nargs="+",
|
|
24
|
+
help="The video file(s) you want modified. Can be one or more directories and / or single files.",
|
|
25
|
+
)
|
|
26
|
+
parser.add_argument(
|
|
27
|
+
"-o",
|
|
28
|
+
"--output_file",
|
|
29
|
+
type=str,
|
|
30
|
+
dest="output_file",
|
|
31
|
+
help="The output file. Only usable if a single file is given. If not included, it'll append _ALTERED to the name.",
|
|
32
|
+
)
|
|
33
|
+
parser.add_argument(
|
|
34
|
+
"--temp_folder",
|
|
35
|
+
type=str,
|
|
36
|
+
default="TEMP",
|
|
37
|
+
help="The file path of the temporary working folder.",
|
|
38
|
+
)
|
|
39
|
+
parser.add_argument(
|
|
40
|
+
"-t",
|
|
41
|
+
"--silent_threshold",
|
|
42
|
+
type=float,
|
|
43
|
+
dest="silent_threshold",
|
|
44
|
+
help="The volume amount that frames' audio needs to surpass to be considered sounded. Defaults to 0.03.",
|
|
45
|
+
)
|
|
46
|
+
parser.add_argument(
|
|
47
|
+
"-S",
|
|
48
|
+
"--sounded_speed",
|
|
49
|
+
type=float,
|
|
50
|
+
dest="sounded_speed",
|
|
51
|
+
help="The speed that sounded (spoken) frames should be played at. Defaults to 1.",
|
|
52
|
+
)
|
|
53
|
+
parser.add_argument(
|
|
54
|
+
"-s",
|
|
55
|
+
"--silent_speed",
|
|
56
|
+
type=float,
|
|
57
|
+
dest="silent_speed",
|
|
58
|
+
help="The speed that silent frames should be played at. Defaults to 4.",
|
|
59
|
+
)
|
|
60
|
+
parser.add_argument(
|
|
61
|
+
"-fm",
|
|
62
|
+
"--frame_margin",
|
|
63
|
+
type=float,
|
|
64
|
+
dest="frame_spreadage",
|
|
65
|
+
help="Some silent frames adjacent to sounded frames are included to provide context. Defaults to 2.",
|
|
66
|
+
)
|
|
67
|
+
parser.add_argument(
|
|
68
|
+
"-sr",
|
|
69
|
+
"--sample_rate",
|
|
70
|
+
type=float,
|
|
71
|
+
dest="sample_rate",
|
|
72
|
+
help="Sample rate of the input and output videos. Usually extracted automatically by FFmpeg.",
|
|
73
|
+
)
|
|
74
|
+
parser.add_argument(
|
|
75
|
+
"--small",
|
|
76
|
+
action="store_true",
|
|
77
|
+
help="Apply small file optimizations: resize video to 720p, audio to 128k bitrate, best compression (uses CUDA if available).",
|
|
78
|
+
)
|
|
79
|
+
return parser
|
|
80
|
+
|
|
81
|
+
|
|
82
|
+
def _gather_input_files(paths: List[str]) -> List[str]:
|
|
83
|
+
"""Expand provided paths into a flat list of files that contain audio streams."""
|
|
84
|
+
|
|
85
|
+
files: List[str] = []
|
|
86
|
+
for input_path in paths:
|
|
87
|
+
if os.path.isfile(input_path) and audio.is_valid_input_file(input_path):
|
|
88
|
+
files.append(os.path.abspath(input_path))
|
|
89
|
+
elif os.path.isdir(input_path):
|
|
90
|
+
for file in os.listdir(input_path):
|
|
91
|
+
candidate = os.path.join(input_path, file)
|
|
92
|
+
if audio.is_valid_input_file(candidate):
|
|
93
|
+
files.append(candidate)
|
|
94
|
+
return files
|
|
95
|
+
|
|
96
|
+
|
|
97
|
+
def main() -> None:
|
|
98
|
+
"""Entry point for the command line interface."""
|
|
99
|
+
|
|
100
|
+
parser = _build_parser()
|
|
101
|
+
parsed_args = parser.parse_args()
|
|
102
|
+
start_time = time.time()
|
|
103
|
+
|
|
104
|
+
files = _gather_input_files(parsed_args.input_file)
|
|
105
|
+
|
|
106
|
+
args: Dict[str, object] = {
|
|
107
|
+
k: v for k, v in vars(parsed_args).items() if v is not None
|
|
108
|
+
}
|
|
109
|
+
del args["input_file"]
|
|
110
|
+
|
|
111
|
+
if len(files) > 1 and "output_file" in args:
|
|
112
|
+
del args["output_file"]
|
|
113
|
+
|
|
114
|
+
for index, file in enumerate(files):
|
|
115
|
+
print(f"Processing file {index + 1}/{len(files)} '{os.path.basename(file)}'")
|
|
116
|
+
local_options = dict(args)
|
|
117
|
+
local_options["input_file"] = file
|
|
118
|
+
local_options["small"] = bool(local_options.get("small", False))
|
|
119
|
+
speed_up_video(**local_options)
|
|
120
|
+
|
|
121
|
+
end_time = time.time()
|
|
122
|
+
total_time = end_time - start_time
|
|
123
|
+
hours, remainder = divmod(total_time, 3600)
|
|
124
|
+
minutes, seconds = divmod(remainder, 60)
|
|
125
|
+
print(f"\nTime: {int(hours)}h {int(minutes)}m {seconds:.2f}s")
|
|
126
|
+
|
|
127
|
+
|
|
128
|
+
if __name__ == "__main__":
|
|
129
|
+
main()
|
|
@@ -0,0 +1,269 @@
|
|
|
1
|
+
"""Utilities for discovering and invoking FFmpeg commands."""
|
|
2
|
+
|
|
3
|
+
from __future__ import annotations
|
|
4
|
+
|
|
5
|
+
import os
|
|
6
|
+
import re
|
|
7
|
+
import subprocess
|
|
8
|
+
import sys
|
|
9
|
+
from functools import partial
|
|
10
|
+
from typing import List, Optional, Tuple
|
|
11
|
+
|
|
12
|
+
from tqdm import tqdm as std_tqdm
|
|
13
|
+
|
|
14
|
+
|
|
15
|
+
def find_ffmpeg() -> Optional[str]:
|
|
16
|
+
"""Locate the FFmpeg executable in common installation locations."""
|
|
17
|
+
|
|
18
|
+
env_override = os.environ.get("TALKS_REDUCER_FFMPEG") or os.environ.get(
|
|
19
|
+
"FFMPEG_PATH"
|
|
20
|
+
)
|
|
21
|
+
if env_override and (os.path.isfile(env_override) or shutil_which(env_override)):
|
|
22
|
+
return (
|
|
23
|
+
os.path.abspath(env_override)
|
|
24
|
+
if os.path.isfile(env_override)
|
|
25
|
+
else env_override
|
|
26
|
+
)
|
|
27
|
+
|
|
28
|
+
common_paths = [
|
|
29
|
+
"C:\\ProgramData\\chocolatey\\bin\\ffmpeg.exe",
|
|
30
|
+
"C:\\Program Files\\ffmpeg\\bin\\ffmpeg.exe",
|
|
31
|
+
"C:\\ffmpeg\\bin\\ffmpeg.exe",
|
|
32
|
+
"ffmpeg",
|
|
33
|
+
]
|
|
34
|
+
|
|
35
|
+
for path in common_paths:
|
|
36
|
+
if os.path.isfile(path) or shutil_which(path):
|
|
37
|
+
return os.path.abspath(path) if os.path.isfile(path) else path
|
|
38
|
+
|
|
39
|
+
return None
|
|
40
|
+
|
|
41
|
+
|
|
42
|
+
def _resolve_ffmpeg_path() -> str:
|
|
43
|
+
"""Resolve the FFmpeg executable path or exit with a helpful message."""
|
|
44
|
+
|
|
45
|
+
ffmpeg_path = find_ffmpeg()
|
|
46
|
+
if not ffmpeg_path:
|
|
47
|
+
print(
|
|
48
|
+
"Error: FFmpeg not found. Please install FFmpeg and add it to your PATH or specify the full path.",
|
|
49
|
+
file=sys.stderr,
|
|
50
|
+
)
|
|
51
|
+
raise SystemExit(1)
|
|
52
|
+
|
|
53
|
+
print(f"Using FFmpeg at: {ffmpeg_path}")
|
|
54
|
+
return ffmpeg_path
|
|
55
|
+
|
|
56
|
+
|
|
57
|
+
FFMPEG_PATH = _resolve_ffmpeg_path()
|
|
58
|
+
|
|
59
|
+
tqdm = partial(
|
|
60
|
+
std_tqdm,
|
|
61
|
+
bar_format=(
|
|
62
|
+
"{desc:<20} {percentage:3.0f}%"
|
|
63
|
+
"|{bar:10}|"
|
|
64
|
+
" {n_fmt:>6}/{total_fmt:>6} [{elapsed:^5}<{remaining:^5}, {rate_fmt}{postfix}]"
|
|
65
|
+
),
|
|
66
|
+
)
|
|
67
|
+
|
|
68
|
+
|
|
69
|
+
def check_cuda_available(ffmpeg_path: str = FFMPEG_PATH) -> bool:
|
|
70
|
+
"""Return whether CUDA hardware encoders are available in the FFmpeg build."""
|
|
71
|
+
|
|
72
|
+
try:
|
|
73
|
+
result = subprocess.run(
|
|
74
|
+
[ffmpeg_path, "-encoders"], capture_output=True, text=True, timeout=5
|
|
75
|
+
)
|
|
76
|
+
except (
|
|
77
|
+
subprocess.TimeoutExpired,
|
|
78
|
+
subprocess.CalledProcessError,
|
|
79
|
+
FileNotFoundError,
|
|
80
|
+
):
|
|
81
|
+
return False
|
|
82
|
+
|
|
83
|
+
if result.returncode != 0:
|
|
84
|
+
return False
|
|
85
|
+
|
|
86
|
+
encoder_list = result.stdout.lower()
|
|
87
|
+
return any(
|
|
88
|
+
encoder in encoder_list for encoder in ["h264_nvenc", "hevc_nvenc", "nvenc"]
|
|
89
|
+
)
|
|
90
|
+
|
|
91
|
+
|
|
92
|
+
def run_timed_ffmpeg_command(command: str, **kwargs) -> None:
|
|
93
|
+
"""Execute an FFmpeg command while streaming progress information to ``tqdm``."""
|
|
94
|
+
|
|
95
|
+
import shlex
|
|
96
|
+
|
|
97
|
+
try:
|
|
98
|
+
args = shlex.split(command)
|
|
99
|
+
except Exception as exc: # pragma: no cover - defensive logging
|
|
100
|
+
print(f"Error parsing command: {exc}", file=sys.stderr)
|
|
101
|
+
raise
|
|
102
|
+
|
|
103
|
+
try:
|
|
104
|
+
process = subprocess.Popen(
|
|
105
|
+
args,
|
|
106
|
+
stdout=subprocess.PIPE,
|
|
107
|
+
stderr=subprocess.PIPE,
|
|
108
|
+
universal_newlines=True,
|
|
109
|
+
bufsize=1,
|
|
110
|
+
errors="replace",
|
|
111
|
+
)
|
|
112
|
+
except Exception as exc: # pragma: no cover - defensive logging
|
|
113
|
+
print(f"Error starting FFmpeg: {exc}", file=sys.stderr)
|
|
114
|
+
raise
|
|
115
|
+
|
|
116
|
+
with tqdm(**kwargs) as progress:
|
|
117
|
+
while True:
|
|
118
|
+
line = process.stderr.readline()
|
|
119
|
+
if not line and process.poll() is not None:
|
|
120
|
+
break
|
|
121
|
+
|
|
122
|
+
if not line:
|
|
123
|
+
continue
|
|
124
|
+
|
|
125
|
+
sys.stderr.write(line)
|
|
126
|
+
sys.stderr.flush()
|
|
127
|
+
|
|
128
|
+
match = re.search(r"frame=\s*(\d+)", line)
|
|
129
|
+
if match:
|
|
130
|
+
try:
|
|
131
|
+
new_frame = int(match.group(1))
|
|
132
|
+
if progress.total < new_frame:
|
|
133
|
+
progress.total = new_frame
|
|
134
|
+
progress.update(new_frame - progress.n)
|
|
135
|
+
except (ValueError, IndexError):
|
|
136
|
+
pass
|
|
137
|
+
|
|
138
|
+
process.wait()
|
|
139
|
+
|
|
140
|
+
if process.returncode != 0:
|
|
141
|
+
error_output = process.stderr.read()
|
|
142
|
+
print(
|
|
143
|
+
f"\nFFmpeg error (return code {process.returncode}):", file=sys.stderr
|
|
144
|
+
)
|
|
145
|
+
print(error_output, file=sys.stderr)
|
|
146
|
+
raise subprocess.CalledProcessError(process.returncode, args)
|
|
147
|
+
|
|
148
|
+
if progress.n < progress.total:
|
|
149
|
+
progress.update(progress.total - progress.n)
|
|
150
|
+
|
|
151
|
+
|
|
152
|
+
def build_extract_audio_command(
|
|
153
|
+
input_file: str,
|
|
154
|
+
output_wav: str,
|
|
155
|
+
sample_rate: int,
|
|
156
|
+
audio_bitrate: str,
|
|
157
|
+
hwaccel: Optional[List[str]] = None,
|
|
158
|
+
ffmpeg_path: str = FFMPEG_PATH,
|
|
159
|
+
) -> str:
|
|
160
|
+
"""Build the FFmpeg command used to extract audio into a temporary WAV file."""
|
|
161
|
+
|
|
162
|
+
hwaccel = hwaccel or []
|
|
163
|
+
command_parts: List[str] = [f'"{ffmpeg_path}"']
|
|
164
|
+
command_parts.extend(hwaccel)
|
|
165
|
+
command_parts.extend(
|
|
166
|
+
[
|
|
167
|
+
f'-i "{input_file}"',
|
|
168
|
+
f"-ab {audio_bitrate} -ac 2",
|
|
169
|
+
f"-ar {sample_rate}",
|
|
170
|
+
"-vn",
|
|
171
|
+
f'"{output_wav}"',
|
|
172
|
+
"-hide_banner -loglevel warning -stats",
|
|
173
|
+
]
|
|
174
|
+
)
|
|
175
|
+
return " ".join(command_parts)
|
|
176
|
+
|
|
177
|
+
|
|
178
|
+
def build_video_commands(
|
|
179
|
+
input_file: str,
|
|
180
|
+
audio_file: str,
|
|
181
|
+
filter_script: str,
|
|
182
|
+
output_file: str,
|
|
183
|
+
*,
|
|
184
|
+
ffmpeg_path: str = FFMPEG_PATH,
|
|
185
|
+
cuda_available: bool,
|
|
186
|
+
small: bool,
|
|
187
|
+
) -> Tuple[str, Optional[str], bool]:
|
|
188
|
+
"""Create the FFmpeg command strings used to render the final video output."""
|
|
189
|
+
|
|
190
|
+
global_parts: List[str] = [f'"{ffmpeg_path}"', "-y"]
|
|
191
|
+
hwaccel_args: List[str] = []
|
|
192
|
+
|
|
193
|
+
if cuda_available and not small:
|
|
194
|
+
hwaccel_args = ["-hwaccel", "cuda", "-hwaccel_output_format", "cuda"]
|
|
195
|
+
global_parts.extend(hwaccel_args)
|
|
196
|
+
elif small and cuda_available:
|
|
197
|
+
pass
|
|
198
|
+
|
|
199
|
+
input_parts = [f'-i "{input_file}"', f'-i "{audio_file}"']
|
|
200
|
+
|
|
201
|
+
output_parts = [
|
|
202
|
+
"-map 0 -map -0:a -map 1:a",
|
|
203
|
+
f'-filter_script:v "{filter_script}"',
|
|
204
|
+
]
|
|
205
|
+
|
|
206
|
+
video_encoder_args: List[str]
|
|
207
|
+
fallback_encoder_args: List[str] = []
|
|
208
|
+
use_cuda_encoder = False
|
|
209
|
+
|
|
210
|
+
if small:
|
|
211
|
+
if cuda_available:
|
|
212
|
+
use_cuda_encoder = True
|
|
213
|
+
video_encoder_args = [
|
|
214
|
+
"-c:v h264_nvenc",
|
|
215
|
+
"-preset p1",
|
|
216
|
+
"-cq 28",
|
|
217
|
+
"-tune",
|
|
218
|
+
"ll",
|
|
219
|
+
]
|
|
220
|
+
fallback_encoder_args = [
|
|
221
|
+
"-c:v libx264",
|
|
222
|
+
"-preset veryfast",
|
|
223
|
+
"-crf 24",
|
|
224
|
+
"-tune",
|
|
225
|
+
"zerolatency",
|
|
226
|
+
]
|
|
227
|
+
else:
|
|
228
|
+
video_encoder_args = [
|
|
229
|
+
"-c:v libx264",
|
|
230
|
+
"-preset veryfast",
|
|
231
|
+
"-crf 24",
|
|
232
|
+
"-tune",
|
|
233
|
+
"zerolatency",
|
|
234
|
+
]
|
|
235
|
+
else:
|
|
236
|
+
global_parts.append("-filter_complex_threads 1")
|
|
237
|
+
if cuda_available:
|
|
238
|
+
video_encoder_args = ["-c:v h264_nvenc"]
|
|
239
|
+
use_cuda_encoder = True
|
|
240
|
+
else:
|
|
241
|
+
video_encoder_args = ["-c:v copy"]
|
|
242
|
+
|
|
243
|
+
audio_parts = ["-c:a aac", f'"{output_file}"', "-loglevel info -stats -hide_banner"]
|
|
244
|
+
|
|
245
|
+
full_command_parts = (
|
|
246
|
+
global_parts + input_parts + output_parts + video_encoder_args + audio_parts
|
|
247
|
+
)
|
|
248
|
+
command_str = " ".join(full_command_parts)
|
|
249
|
+
|
|
250
|
+
fallback_command_str: Optional[str] = None
|
|
251
|
+
if fallback_encoder_args:
|
|
252
|
+
fallback_parts = (
|
|
253
|
+
global_parts
|
|
254
|
+
+ input_parts
|
|
255
|
+
+ output_parts
|
|
256
|
+
+ fallback_encoder_args
|
|
257
|
+
+ audio_parts
|
|
258
|
+
)
|
|
259
|
+
fallback_command_str = " ".join(fallback_parts)
|
|
260
|
+
|
|
261
|
+
return command_str, fallback_command_str, use_cuda_encoder
|
|
262
|
+
|
|
263
|
+
|
|
264
|
+
def shutil_which(cmd: str) -> Optional[str]:
|
|
265
|
+
"""Wrapper around :func:`shutil.which` for easier testing."""
|
|
266
|
+
|
|
267
|
+
from shutil import which as _which
|
|
268
|
+
|
|
269
|
+
return _which(cmd)
|
|
@@ -0,0 +1,244 @@
|
|
|
1
|
+
"""High-level pipeline orchestration for Talks Reducer."""
|
|
2
|
+
|
|
3
|
+
from __future__ import annotations
|
|
4
|
+
|
|
5
|
+
import math
|
|
6
|
+
import os
|
|
7
|
+
import re
|
|
8
|
+
import subprocess
|
|
9
|
+
from typing import Dict, Optional
|
|
10
|
+
|
|
11
|
+
import numpy as np
|
|
12
|
+
from scipy.io import wavfile
|
|
13
|
+
|
|
14
|
+
from . import audio as audio_utils
|
|
15
|
+
from . import chunks as chunk_utils
|
|
16
|
+
from .ffmpeg import (
|
|
17
|
+
FFMPEG_PATH,
|
|
18
|
+
build_extract_audio_command,
|
|
19
|
+
build_video_commands,
|
|
20
|
+
check_cuda_available,
|
|
21
|
+
run_timed_ffmpeg_command,
|
|
22
|
+
)
|
|
23
|
+
|
|
24
|
+
|
|
25
|
+
def _input_to_output_filename(filename: str, small: bool = False) -> str:
|
|
26
|
+
dot_index = filename.rfind(".")
|
|
27
|
+
suffix = "_speedup_small" if small else "_speedup"
|
|
28
|
+
return filename[:dot_index] + suffix + filename[dot_index:]
|
|
29
|
+
|
|
30
|
+
|
|
31
|
+
def _create_path(path: str) -> None:
|
|
32
|
+
try:
|
|
33
|
+
os.mkdir(path)
|
|
34
|
+
except OSError as exc: # pragma: no cover - defensive logging
|
|
35
|
+
raise AssertionError(
|
|
36
|
+
"Creation of the directory failed. (The TEMP folder may already exist. Delete or rename it, and try again.)"
|
|
37
|
+
) from exc
|
|
38
|
+
|
|
39
|
+
|
|
40
|
+
def _delete_path(path: str) -> None:
|
|
41
|
+
import time
|
|
42
|
+
from shutil import rmtree
|
|
43
|
+
|
|
44
|
+
try:
|
|
45
|
+
rmtree(path, ignore_errors=False)
|
|
46
|
+
for i in range(5):
|
|
47
|
+
if not os.path.exists(path):
|
|
48
|
+
return
|
|
49
|
+
time.sleep(0.01 * i)
|
|
50
|
+
except OSError as exc: # pragma: no cover - defensive logging
|
|
51
|
+
print(f"Deletion of the directory {path} failed")
|
|
52
|
+
print(exc)
|
|
53
|
+
|
|
54
|
+
|
|
55
|
+
def _extract_video_metadata(input_file: str, frame_rate: float) -> Dict[str, float]:
|
|
56
|
+
command = (
|
|
57
|
+
'ffprobe -i "{}" -hide_banner -loglevel error -select_streams v'
|
|
58
|
+
" -show_entries format=duration:stream=avg_frame_rate".format(input_file)
|
|
59
|
+
)
|
|
60
|
+
process = subprocess.Popen(
|
|
61
|
+
command,
|
|
62
|
+
stdout=subprocess.PIPE,
|
|
63
|
+
stderr=subprocess.PIPE,
|
|
64
|
+
bufsize=1,
|
|
65
|
+
universal_newlines=True,
|
|
66
|
+
)
|
|
67
|
+
stdout, _ = process.communicate()
|
|
68
|
+
|
|
69
|
+
match_frame_rate = re.search(r"frame_rate=(\d*)/(\d*)", str(stdout))
|
|
70
|
+
if match_frame_rate is not None:
|
|
71
|
+
frame_rate = float(match_frame_rate.group(1)) / float(match_frame_rate.group(2))
|
|
72
|
+
|
|
73
|
+
match_duration = re.search(r"duration=([\d.]*)", str(stdout))
|
|
74
|
+
original_duration = float(match_duration.group(1)) if match_duration else 0.0
|
|
75
|
+
|
|
76
|
+
return {"frame_rate": frame_rate, "duration": original_duration}
|
|
77
|
+
|
|
78
|
+
|
|
79
|
+
def _ensure_two_dimensional(audio_data: np.ndarray) -> np.ndarray:
|
|
80
|
+
if audio_data.ndim == 1:
|
|
81
|
+
return audio_data[:, np.newaxis]
|
|
82
|
+
return audio_data
|
|
83
|
+
|
|
84
|
+
|
|
85
|
+
def _prepare_output_audio(output_audio_data: np.ndarray) -> np.ndarray:
|
|
86
|
+
if output_audio_data.ndim == 2 and output_audio_data.shape[1] == 1:
|
|
87
|
+
return output_audio_data[:, 0]
|
|
88
|
+
return output_audio_data
|
|
89
|
+
|
|
90
|
+
|
|
91
|
+
def speed_up_video(
|
|
92
|
+
input_file: str,
|
|
93
|
+
output_file: Optional[str] = None,
|
|
94
|
+
frame_rate: float = 30,
|
|
95
|
+
sample_rate: int = 44100,
|
|
96
|
+
silent_threshold: float = 0.03,
|
|
97
|
+
silent_speed: float = 4.0,
|
|
98
|
+
sounded_speed: float = 1.0,
|
|
99
|
+
frame_spreadage: int = 2,
|
|
100
|
+
audio_fade_envelope_size: int = 400,
|
|
101
|
+
temp_folder: str = "TEMP",
|
|
102
|
+
small: bool = False,
|
|
103
|
+
) -> None:
|
|
104
|
+
"""Speed up a video by shortening silent sections while keeping sounded sections intact."""
|
|
105
|
+
|
|
106
|
+
if output_file is None:
|
|
107
|
+
output_file = _input_to_output_filename(input_file, small)
|
|
108
|
+
|
|
109
|
+
cuda_available = check_cuda_available()
|
|
110
|
+
|
|
111
|
+
if os.path.exists(temp_folder):
|
|
112
|
+
_delete_path(temp_folder)
|
|
113
|
+
_create_path(temp_folder)
|
|
114
|
+
|
|
115
|
+
metadata = _extract_video_metadata(input_file, frame_rate)
|
|
116
|
+
frame_rate = metadata["frame_rate"]
|
|
117
|
+
original_duration = metadata["duration"]
|
|
118
|
+
|
|
119
|
+
hwaccel = (
|
|
120
|
+
["-hwaccel", "cuda", "-hwaccel_output_format", "cuda"] if cuda_available else []
|
|
121
|
+
)
|
|
122
|
+
audio_bitrate = "128k" if small else "160k"
|
|
123
|
+
audio_wav = os.path.join(temp_folder, "audio.wav")
|
|
124
|
+
|
|
125
|
+
extract_command = build_extract_audio_command(
|
|
126
|
+
input_file,
|
|
127
|
+
audio_wav,
|
|
128
|
+
sample_rate,
|
|
129
|
+
audio_bitrate,
|
|
130
|
+
hwaccel,
|
|
131
|
+
)
|
|
132
|
+
|
|
133
|
+
run_timed_ffmpeg_command(
|
|
134
|
+
extract_command,
|
|
135
|
+
total=int(original_duration * frame_rate),
|
|
136
|
+
unit="frames",
|
|
137
|
+
desc="Extracting audio:",
|
|
138
|
+
)
|
|
139
|
+
|
|
140
|
+
wav_sample_rate, audio_data = wavfile.read(audio_wav)
|
|
141
|
+
audio_data = _ensure_two_dimensional(audio_data)
|
|
142
|
+
audio_sample_count = audio_data.shape[0]
|
|
143
|
+
max_audio_volume = audio_utils.get_max_volume(audio_data)
|
|
144
|
+
|
|
145
|
+
print("\nProcessing Information:")
|
|
146
|
+
print(f"- Max Audio Volume: {max_audio_volume}")
|
|
147
|
+
print(f"- Processing on: {'GPU (CUDA)' if cuda_available else 'CPU'}")
|
|
148
|
+
if small:
|
|
149
|
+
print("- Small mode: 720p video, 128k audio, optimized compression")
|
|
150
|
+
|
|
151
|
+
samples_per_frame = wav_sample_rate / frame_rate
|
|
152
|
+
audio_frame_count = int(math.ceil(audio_sample_count / samples_per_frame))
|
|
153
|
+
|
|
154
|
+
has_loud_audio = chunk_utils.detect_loud_frames(
|
|
155
|
+
audio_data,
|
|
156
|
+
audio_frame_count,
|
|
157
|
+
samples_per_frame,
|
|
158
|
+
max_audio_volume,
|
|
159
|
+
silent_threshold,
|
|
160
|
+
)
|
|
161
|
+
|
|
162
|
+
chunks, _ = chunk_utils.build_chunks(has_loud_audio, frame_spreadage)
|
|
163
|
+
|
|
164
|
+
print(f"Generated {len(chunks)} chunks:")
|
|
165
|
+
for index, chunk in enumerate(chunks[:5]):
|
|
166
|
+
print(f" Chunk {index}: {chunk}")
|
|
167
|
+
if len(chunks) > 5:
|
|
168
|
+
print(f" ... and {len(chunks) - 5} more chunks")
|
|
169
|
+
|
|
170
|
+
new_speeds = [silent_speed, sounded_speed]
|
|
171
|
+
output_audio_data, updated_chunks = audio_utils.process_audio_chunks(
|
|
172
|
+
audio_data,
|
|
173
|
+
chunks,
|
|
174
|
+
samples_per_frame,
|
|
175
|
+
new_speeds,
|
|
176
|
+
audio_fade_envelope_size,
|
|
177
|
+
max_audio_volume,
|
|
178
|
+
)
|
|
179
|
+
|
|
180
|
+
audio_new_path = os.path.join(temp_folder, "audioNew.wav")
|
|
181
|
+
wavfile.write(audio_new_path, sample_rate, _prepare_output_audio(output_audio_data))
|
|
182
|
+
|
|
183
|
+
expression = chunk_utils.get_tree_expression(updated_chunks)
|
|
184
|
+
filter_graph_path = os.path.join(temp_folder, "filterGraph.txt")
|
|
185
|
+
with open(filter_graph_path, "w", encoding="utf-8") as filter_graph_file:
|
|
186
|
+
filter_parts = []
|
|
187
|
+
if small:
|
|
188
|
+
filter_parts.append("scale=-2:720")
|
|
189
|
+
filter_parts.append(f"fps=fps={frame_rate}")
|
|
190
|
+
filter_parts.append(f'setpts={expression.replace(",", "\\,")}')
|
|
191
|
+
filter_graph_file.write(",".join(filter_parts))
|
|
192
|
+
|
|
193
|
+
command_str, fallback_command_str, use_cuda_encoder = build_video_commands(
|
|
194
|
+
input_file,
|
|
195
|
+
audio_new_path,
|
|
196
|
+
filter_graph_path,
|
|
197
|
+
output_file,
|
|
198
|
+
ffmpeg_path=FFMPEG_PATH,
|
|
199
|
+
cuda_available=cuda_available,
|
|
200
|
+
small=small,
|
|
201
|
+
)
|
|
202
|
+
|
|
203
|
+
output_dir = os.path.dirname(os.path.abspath(output_file))
|
|
204
|
+
if output_dir and not os.path.exists(output_dir):
|
|
205
|
+
print(f"Creating output directory: {output_dir}")
|
|
206
|
+
os.makedirs(output_dir, exist_ok=True)
|
|
207
|
+
|
|
208
|
+
print("\nExecuting FFmpeg command:")
|
|
209
|
+
print(command_str)
|
|
210
|
+
|
|
211
|
+
if not os.path.exists(audio_new_path):
|
|
212
|
+
print("ERROR: Audio file not found!")
|
|
213
|
+
_delete_path(temp_folder)
|
|
214
|
+
return
|
|
215
|
+
|
|
216
|
+
if not os.path.exists(filter_graph_path):
|
|
217
|
+
print("ERROR: Filter file not found!")
|
|
218
|
+
_delete_path(temp_folder)
|
|
219
|
+
return
|
|
220
|
+
|
|
221
|
+
try:
|
|
222
|
+
run_timed_ffmpeg_command(
|
|
223
|
+
command_str,
|
|
224
|
+
total=updated_chunks[-1][3],
|
|
225
|
+
unit="frames",
|
|
226
|
+
desc="Generating final:",
|
|
227
|
+
)
|
|
228
|
+
except subprocess.CalledProcessError as exc:
|
|
229
|
+
if fallback_command_str and use_cuda_encoder:
|
|
230
|
+
print("CUDA encoding failed, retrying with CPU encoder...")
|
|
231
|
+
run_timed_ffmpeg_command(
|
|
232
|
+
fallback_command_str,
|
|
233
|
+
total=updated_chunks[-1][3],
|
|
234
|
+
unit="frames",
|
|
235
|
+
desc="Generating final (fallback):",
|
|
236
|
+
)
|
|
237
|
+
else:
|
|
238
|
+
print(f"\nError running FFmpeg command: {exc}")
|
|
239
|
+
print(
|
|
240
|
+
"Please check if all input files exist and FFmpeg has proper permissions."
|
|
241
|
+
)
|
|
242
|
+
raise
|
|
243
|
+
finally:
|
|
244
|
+
_delete_path(temp_folder)
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: talks-reducer
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: CLI for speeding up long-form talks by removing silence
|
|
5
|
+
Author: Talks Reducer Maintainers
|
|
6
|
+
License: MIT License
|
|
7
|
+
|
|
8
|
+
Copyright (c) 2019 carykh
|
|
9
|
+
Copyright (c) 2020 gegell
|
|
10
|
+
Copyright (c) 2025 Stanislav Popov
|
|
11
|
+
|
|
12
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
13
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
14
|
+
in the Software without restriction, including without limitation the rights
|
|
15
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
16
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
17
|
+
furnished to do so, subject to the following conditions:
|
|
18
|
+
|
|
19
|
+
The above copyright notice and this permission notice shall be included in all
|
|
20
|
+
copies or substantial portions of the Software.
|
|
21
|
+
|
|
22
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
23
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
24
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
25
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
26
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
27
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
28
|
+
SOFTWARE.
|
|
29
|
+
|
|
30
|
+
Requires-Python: >=3.9
|
|
31
|
+
Description-Content-Type: text/markdown
|
|
32
|
+
License-File: LICENSE
|
|
33
|
+
Requires-Dist: audiotsm>=0.1.2
|
|
34
|
+
Requires-Dist: scipy>=1.10.0
|
|
35
|
+
Requires-Dist: numpy<2.0.0,>=1.22.0
|
|
36
|
+
Requires-Dist: tqdm>=4.65.0
|
|
37
|
+
Dynamic: license-file
|
|
38
|
+
|
|
39
|
+
# Talks Reducer
|
|
40
|
+
Talks Reducer shortens long-form presentations by removing silent gaps and optionally re-encoding them to smaller files. The
|
|
41
|
+
project was renamed from **jumpcutter** to emphasize its focus on conference talks and lectures.
|
|
42
|
+
|
|
43
|
+
When CUDA-capable hardware is available the pipeline leans on GPU encoders to keep export times low, but it still runs great on
|
|
44
|
+
CPUs.
|
|
45
|
+
|
|
46
|
+
## Repository Structure
|
|
47
|
+
- `talks_reducer/` — Python package that exposes the CLI and reusable pipeline:
|
|
48
|
+
- `cli.py` parses arguments and dispatches to the pipeline.
|
|
49
|
+
- `pipeline.py` orchestrates FFmpeg, audio processing, and temporary assets.
|
|
50
|
+
- `audio.py` handles audio validation, volume analysis, and phase vocoder processing.
|
|
51
|
+
- `chunks.py` builds timing metadata and FFmpeg expressions for frame selection.
|
|
52
|
+
- `ffmpeg.py` discovers the FFmpeg binary, checks CUDA availability, and assembles command strings.
|
|
53
|
+
- `requirements.txt` — Python dependencies for local development.
|
|
54
|
+
- `default.nix` — reproducible environment definition for Nix users.
|
|
55
|
+
- `CONTRIBUTION.md` — development workflow, formatting expectations, and release checklist.
|
|
56
|
+
- `AGENTS.md` — maintainer tips and coding conventions for this repository.
|
|
57
|
+
|
|
58
|
+
## Example
|
|
59
|
+
- 1h 37m, 571 MB — Original OBS video
|
|
60
|
+
- 1h 19m, 751 MB — Talks Reducer
|
|
61
|
+
- 1h 19m, 171 MB — Talks Reducer `--small`
|
|
62
|
+
|
|
63
|
+
The `--small` preset applies a 720p video scale and 128 kbps audio bitrate, making it useful for sharing talks over constrained
|
|
64
|
+
connections. Without `--small`, the script aims to preserve original quality while removing silence.
|
|
65
|
+
|
|
66
|
+
## Highlights
|
|
67
|
+
- Builds on gegell's classic jumpcutter workflow with more efficient frame and audio processing
|
|
68
|
+
- Generates FFmpeg filter graphs instead of writing temporary frames to disk
|
|
69
|
+
- Streams audio transformations in memory to avoid slow intermediate files
|
|
70
|
+
- Accepts multiple inputs or directories of recordings in a single run
|
|
71
|
+
- Provides progress feedback via `tqdm`
|
|
72
|
+
- Automatically detects NVENC availability, so you no longer need to pass `--cuda`
|
|
73
|
+
|
|
74
|
+
## Processing Pipeline
|
|
75
|
+
1. Validate that each input file contains an audio stream using `ffprobe`.
|
|
76
|
+
2. Extract audio and calculate loudness to identify silent regions.
|
|
77
|
+
3. Stretch the non-silent segments with `audiotsm` to maintain speech clarity.
|
|
78
|
+
4. Stitch the processed audio and video together with FFmpeg, using NVENC if the GPU encoders are detected.
|
|
79
|
+
|
|
80
|
+
## Recent Updates
|
|
81
|
+
- **October 2025** — Project renamed to *Talks Reducer* across documentation and scripts.
|
|
82
|
+
- **October 2025** — Added `--small` preset with 720p/128 kbps defaults for bandwidth-friendly exports.
|
|
83
|
+
- **October 2025** — Removed the `--cuda` flag; CUDA/NVENC support is now auto-detected.
|
|
84
|
+
- **October 2025** — Improved `--small` encoder arguments to balance size and clarity.
|
|
85
|
+
- **October 2025** — CLI argument parsing fixes to prevent crashes on invalid combinations.
|
|
86
|
+
- **October 2025** — Added example output comparison to the README.
|
|
87
|
+
|
|
88
|
+
## Quick Start
|
|
89
|
+
1. Install FFmpeg and ensure it is on your `PATH`
|
|
90
|
+
2. Install Talks Reducer with `pip install talks-reducer` (this exposes the `talks-reducer` command)
|
|
91
|
+
3. Inspect available options with `talks-reducer --help`
|
|
92
|
+
4. Process a recording using `talks-reducer /path/to/video`
|
|
93
|
+
|
|
94
|
+
## Requirements
|
|
95
|
+
- Python 3 with `numpy`, `scipy`, `audiotsm`, and `tqdm`
|
|
96
|
+
- FFmpeg with optional NVIDIA NVENC support for CUDA acceleration
|
|
97
|
+
|
|
98
|
+
## Contributing
|
|
99
|
+
See `CONTRIBUTION.md` for development setup details and guidance on sharing improvements.
|
|
100
|
+
|
|
101
|
+
## License
|
|
102
|
+
Talks Reducer is released under the MIT License. See `LICENSE` for the full text.
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
LICENSE
|
|
2
|
+
README.md
|
|
3
|
+
pyproject.toml
|
|
4
|
+
talks_reducer/__init__.py
|
|
5
|
+
talks_reducer/__main__.py
|
|
6
|
+
talks_reducer/audio.py
|
|
7
|
+
talks_reducer/chunks.py
|
|
8
|
+
talks_reducer/cli.py
|
|
9
|
+
talks_reducer/ffmpeg.py
|
|
10
|
+
talks_reducer/pipeline.py
|
|
11
|
+
talks_reducer.egg-info/PKG-INFO
|
|
12
|
+
talks_reducer.egg-info/SOURCES.txt
|
|
13
|
+
talks_reducer.egg-info/dependency_links.txt
|
|
14
|
+
talks_reducer.egg-info/entry_points.txt
|
|
15
|
+
talks_reducer.egg-info/requires.txt
|
|
16
|
+
talks_reducer.egg-info/top_level.txt
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
talks_reducer
|