violin 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- violin-0.1.0/.claude/skills/video-translator/SKILL.md +67 -0
- violin-0.1.0/.dockerignore +17 -0
- violin-0.1.0/.env.example +21 -0
- violin-0.1.0/.gitignore +18 -0
- violin-0.1.0/.python-version +1 -0
- violin-0.1.0/Caddyfile +12 -0
- violin-0.1.0/Dockerfile +19 -0
- violin-0.1.0/LICENSE +21 -0
- violin-0.1.0/PKG-INFO +315 -0
- violin-0.1.0/README.md +276 -0
- violin-0.1.0/api/__init__.py +0 -0
- violin-0.1.0/api/app.py +109 -0
- violin-0.1.0/api/config.py +15 -0
- violin-0.1.0/api/models.py +87 -0
- violin-0.1.0/api/routes/__init__.py +0 -0
- violin-0.1.0/api/routes/catalog.py +190 -0
- violin-0.1.0/api/routes/chat.py +39 -0
- violin-0.1.0/api/routes/files.py +133 -0
- violin-0.1.0/api/routes/jobs.py +228 -0
- violin-0.1.0/api/static/demo/hassan_gpt_oss_en.mp4 +0 -0
- violin-0.1.0/api/static/demo/hassan_gpt_oss_ko.mp4 +0 -0
- violin-0.1.0/api/static/demo/percy_en.mp4 +0 -0
- violin-0.1.0/api/static/demo/percy_zh.mp4 +0 -0
- violin-0.1.0/api/static/demo/posters/dario_interview_it.jpg +0 -0
- violin-0.1.0/api/static/demo/posters/hassan_gpt_oss_ko.jpg +0 -0
- violin-0.1.0/api/static/demo/posters/percy_zh.jpg +0 -0
- violin-0.1.0/api/static/index.html +1644 -0
- violin-0.1.0/api/stats.py +141 -0
- violin-0.1.0/api/storage.py +241 -0
- violin-0.1.0/api/usage.py +61 -0
- violin-0.1.0/api/video_chat.py +185 -0
- violin-0.1.0/api/worker.py +237 -0
- violin-0.1.0/assets/demo_en.mp4 +0 -0
- violin-0.1.0/assets/outcome.png +0 -0
- violin-0.1.0/config/default.yaml +88 -0
- violin-0.1.0/config/other_api.yaml +18 -0
- violin-0.1.0/config/prod.yaml +21 -0
- violin-0.1.0/docker-compose.yml +26 -0
- violin-0.1.0/main.py +219 -0
- violin-0.1.0/pipeline/__init__.py +0 -0
- violin-0.1.0/pipeline/config.py +49 -0
- violin-0.1.0/pipeline/costs.py +123 -0
- violin-0.1.0/pipeline/extractor.py +67 -0
- violin-0.1.0/pipeline/ffmpeg_utils.py +47 -0
- violin-0.1.0/pipeline/languages.py +48 -0
- violin-0.1.0/pipeline/llm_client.py +212 -0
- violin-0.1.0/pipeline/merger.py +642 -0
- violin-0.1.0/pipeline/orchestrator.py +219 -0
- violin-0.1.0/pipeline/pricing.py +47 -0
- violin-0.1.0/pipeline/styles.py +60 -0
- violin-0.1.0/pipeline/transcriber.py +451 -0
- violin-0.1.0/pipeline/translator.py +281 -0
- violin-0.1.0/pipeline/tts.py +122 -0
- violin-0.1.0/pipeline/tts_elevenlabs.py +283 -0
- violin-0.1.0/pipeline/tts_openai.py +141 -0
- violin-0.1.0/pipeline/tts_together.py +150 -0
- violin-0.1.0/prompts/__init__.py +34 -0
- violin-0.1.0/prompts/styles.yaml +118 -0
- violin-0.1.0/prompts/translate.yaml +106 -0
- violin-0.1.0/prompts/video_chat.yaml +14 -0
- violin-0.1.0/prompts/voice_match.yaml +16 -0
- violin-0.1.0/pyproject.toml +66 -0
- violin-0.1.0/run_api.py +65 -0
- violin-0.1.0/uv.lock +1404 -0
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: video-translator
|
|
3
|
+
description: Dub a video into another language and generate subtitles using the default Together + Cartesia stack. Trigger when the user wants to translate / dub / voice-over a video file, or generate subtitles for it. Handles `.mp4` / `.mkv` / `.webm`. Installs as the `violin` CLI (and `violin-api` for the FastAPI server) via `uv tool install`. For alternative models (OpenAI / ElevenLabs) or custom configs, point the user to the repo: https://github.com/shang-zhu/violin.
|
|
4
|
+
allowed-tools: Bash, Read
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Violin — operating skill
|
|
8
|
+
|
|
9
|
+
Always uses the default config (Together for translation, `cartesia/sonic-3` for TTS). If the user asks for OpenAI, ElevenLabs, or custom configs, **stop and point them to the Violin repo** — those flows aren't supported through the global CLI.
|
|
10
|
+
|
|
11
|
+
## Pre-flight
|
|
12
|
+
|
|
13
|
+
Run these silently first. Abort if any fails:
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
command -v violin # 1. CLI on PATH
|
|
17
|
+
test -f "<input>" # 2. Input exists
|
|
18
|
+
printenv TOGETHER_API_KEY # 3. Key available
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
If `violin` is missing: tell the user to `uv tool install violin`, then `violin --install-skill` to refresh this skill file. Do not auto-install.
|
|
22
|
+
|
|
23
|
+
If `TOGETHER_API_KEY` is missing:
|
|
24
|
+
- Inside the Violin repo → populate `.env` (auto-loaded)
|
|
25
|
+
- Elsewhere → `export TOGETHER_API_KEY=...` in `~/.zshrc` / `~/.bashrc`, then `source` it
|
|
26
|
+
|
|
27
|
+
## Decisions
|
|
28
|
+
|
|
29
|
+
- **CLI vs API**: single run-and-wait file → CLI (`violin`). Multi-job / HTTP / web UI → API server (`violin-api`); print the command, don't auto-start it.
|
|
30
|
+
- **Style** (`--style`): default `standard`. Kids content → `kids`, formal/lecture → `academic`, casual → `casual`, dramatic → `storyteller`, news → `news`. Run `violin --style list` if unsure.
|
|
31
|
+
- **Voiceover**: keep default (mix dubbed audio over a quiet original). Use `--no-voiceover` only when the user explicitly says "replace audio entirely".
|
|
32
|
+
|
|
33
|
+
## Run
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
violin <input> <output> --language <Lang> [flags]
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## Flags
|
|
40
|
+
|
|
41
|
+
| Flag | Default | When to set |
|
|
42
|
+
|------|---------|-------------|
|
|
43
|
+
| `--language` / `-l` | *required* | Target language (e.g. `Chinese`, `Spanish`, `Japanese`). |
|
|
44
|
+
| `--voice` / `-v` | auto (native voice picked by `preferences.voice_gender`) | Only when the user names a specific voice from the catalog (e.g. `"warm female narrator"`). Otherwise omit and let the default kick in. |
|
|
45
|
+
| `--source-language` | `auto-detect` | Only if Whisper mis-detects the source language. |
|
|
46
|
+
| `--style` / `-s` | `standard` | See Decisions above. |
|
|
47
|
+
| `--no-subtitles` | off | User says "no SRT" / "video only". |
|
|
48
|
+
| `--no-voiceover` | off | User says "replace original audio entirely". |
|
|
49
|
+
| `--config` / `-c` | `config/default.yaml` | Don't use through this skill — repo-only flow. |
|
|
50
|
+
| `--timings-out` | off | Only when the user wants a per-step timing JSON for debugging / benchmarking. |
|
|
51
|
+
|
|
52
|
+
## Language coverage
|
|
53
|
+
|
|
54
|
+
33 target languages total. **16** ship with handpicked native-speaker voices: Chinese, Spanish, English, Hindi, Arabic, Portuguese, Russian, Japanese, Turkish, German, Korean, French, Italian, Polish, Dutch, Swedish. The other **17** fall back to the English voice catalog (multilingual under Cartesia Sonic 3) — quality is decent but the voice isn't a native speaker. Mention this caveat only if the user is translating to a fallback language and asks about voice quality.
|
|
55
|
+
|
|
56
|
+
## Report back
|
|
57
|
+
|
|
58
|
+
- Output video path + SRT path (printed by the run).
|
|
59
|
+
- Total cost (printed at end — surface, don't hide).
|
|
60
|
+
- If voiceover was on, mention the `_original.m4a` sidecar.
|
|
61
|
+
|
|
62
|
+
## Don'ts
|
|
63
|
+
|
|
64
|
+
- Don't run on multi-GB videos without first quoting the rough cost (audio length × per-provider rates in `pipeline/pricing.py`).
|
|
65
|
+
- Don't fabricate a "subtitles-only" mode — the CLI requires the full pipeline. If the user only wants SRT, run the full pipeline and hand them just the `.srt`, warning them of the cost first.
|
|
66
|
+
- Don't try to switch to OpenAI or ElevenLabs from this skill. Point the user to the repo + `--config config/other_api.yaml` (or their own override).
|
|
67
|
+
- Don't paraphrase the README. For supported languages (33), voice catalog, and full flag docs, point them at `README.md` or `violin --help`.
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
# ── API keys ─────────────────────────────────────────────────
|
|
2
|
+
# Only fill in the keys for the providers you actually use — see
|
|
3
|
+
# config/default.yaml `models:` section. The default Together stack
|
|
4
|
+
# (Whisper + LLM + Cartesia TTS) needs only TOGETHER_API_KEY.
|
|
5
|
+
|
|
6
|
+
# Together AI — https://api.together.ai
|
|
7
|
+
TOGETHER_API_KEY=
|
|
8
|
+
|
|
9
|
+
# OpenAI — needed when any stage uses `provider: openai`
|
|
10
|
+
# (whisper-1, GPT models, tts-1, vision chat). https://platform.openai.com
|
|
11
|
+
OPENAI_API_KEY=
|
|
12
|
+
|
|
13
|
+
# ElevenLabs — needed when TTS uses `provider: elevenlabs`
|
|
14
|
+
# https://elevenlabs.io/app/settings/api-keys
|
|
15
|
+
ELEVENLABS_API_KEY=
|
|
16
|
+
|
|
17
|
+
|
|
18
|
+
# ── Optional ─────────────────────────────────────────────────
|
|
19
|
+
|
|
20
|
+
# CORS allow-list for the web app (default: * — any origin)
|
|
21
|
+
# CORS_ORIGINS=https://yourdomain.com
|
violin-0.1.0/.gitignore
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
3.13
|
violin-0.1.0/Caddyfile
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
# Replace with your actual domain name.
|
|
2
|
+
# Caddy will automatically provision a Let's Encrypt TLS certificate
|
|
3
|
+
# once the domain's DNS A record points to this server's public IP.
|
|
4
|
+
#
|
|
5
|
+
# For local testing without a domain, replace with:
|
|
6
|
+
# :80 {
|
|
7
|
+
# reverse_proxy violin:8000
|
|
8
|
+
# }
|
|
9
|
+
|
|
10
|
+
violin-ai.com {
|
|
11
|
+
reverse_proxy violin:8000
|
|
12
|
+
}
|
violin-0.1.0/Dockerfile
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
1
|
+
FROM python:3.13-slim AS base
|
|
2
|
+
|
|
3
|
+
RUN apt-get update && \
|
|
4
|
+
apt-get install -y --no-install-recommends libsndfile1 ffmpeg && \
|
|
5
|
+
rm -rf /var/lib/apt/lists/*
|
|
6
|
+
|
|
7
|
+
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
|
|
8
|
+
|
|
9
|
+
WORKDIR /app
|
|
10
|
+
|
|
11
|
+
COPY pyproject.toml uv.lock ./
|
|
12
|
+
RUN uv sync --frozen --no-dev --no-install-project
|
|
13
|
+
|
|
14
|
+
COPY . .
|
|
15
|
+
RUN uv sync --frozen --no-dev
|
|
16
|
+
|
|
17
|
+
EXPOSE 8000
|
|
18
|
+
|
|
19
|
+
CMD ["uv", "run", "run_api.py", "--host", "0.0.0.0", "--port", "8000", "--config", "config/prod.yaml"]
|
violin-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Violin contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
violin-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,315 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: violin
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Open-source video dubbing — translate any video into 33 languages with native-sounding voice-over and synced subtitles.
|
|
5
|
+
Project-URL: Repository, https://github.com/shang-zhu/Violin
|
|
6
|
+
Project-URL: Issues, https://github.com/shang-zhu/Violin/issues
|
|
7
|
+
Author: Shang Zhu, Qinghong Lin
|
|
8
|
+
License: MIT
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Keywords: dubbing,elevenlabs,subtitles,together-ai,translation,tts,video,whisper
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: Intended Audience :: End Users/Desktop
|
|
14
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
15
|
+
Classifier: Operating System :: OS Independent
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
20
|
+
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
|
|
21
|
+
Classifier: Topic :: Multimedia :: Video
|
|
22
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
23
|
+
Requires-Python: >=3.10
|
|
24
|
+
Requires-Dist: aiofiles>=25.1.0
|
|
25
|
+
Requires-Dist: elevenlabs>=1.0.0
|
|
26
|
+
Requires-Dist: fastapi>=0.135.2
|
|
27
|
+
Requires-Dist: ffmpeg-python>=0.2.0
|
|
28
|
+
Requires-Dist: httpx>=0.28.1
|
|
29
|
+
Requires-Dist: imageio-ffmpeg>=0.6.0
|
|
30
|
+
Requires-Dist: openai>=1.0.0
|
|
31
|
+
Requires-Dist: python-dotenv>=1.2.2
|
|
32
|
+
Requires-Dist: python-multipart>=0.0.22
|
|
33
|
+
Requires-Dist: pyyaml>=6.0.3
|
|
34
|
+
Requires-Dist: soundfile>=0.13.1
|
|
35
|
+
Requires-Dist: together>=2.5.0
|
|
36
|
+
Requires-Dist: uvicorn>=0.42.0
|
|
37
|
+
Requires-Dist: yt-dlp>=2025.1.0
|
|
38
|
+
Description-Content-Type: text/markdown
|
|
39
|
+
|
|
40
|
+
# 🎻 Violin
|
|
41
|
+
|
|
42
|
+
**Open-source video dubbing — translate any video into 33 languages with natural-sounding voice-over and synced subtitles.**
|
|
43
|
+
|
|
44
|
+
[🌐 Live demo](https://violin-ai.com) · [📜 MIT License](https://github.com/shang-zhu/violin/blob/main/LICENSE)
|
|
45
|
+
|
|
46
|
+
<!--  -->
|
|
47
|
+
|
|
48
|
+
Upload a video. Violin transcribes the speech, translates it, synthesizes a native-sounding voice-over in the target language, and remuxes it back into the video — fully aligned, with optional SRT subtitles.
|
|
49
|
+
|
|
50
|
+
Available as a **CLI**, a **FastAPI web app**, and a **Claude Code skill**.
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## ✨ Features
|
|
55
|
+
|
|
56
|
+
- **33 target languages** with handpicked native-speaker voices for the 16 most-used ones (Cartesia Sonic 3 + ElevenLabs)
|
|
57
|
+
- **In-video Q&A** — ask questions about any moment in the dubbed video; answers use nearby subtitles plus sampled frames
|
|
58
|
+
- **Natural-language voice picker** — describe the voice you want, an LLM picks from the catalog
|
|
59
|
+
- **6 style profiles** *(experimental)* — standard / kids / academic / casual / storyteller / news
|
|
60
|
+
- **Pluggable stack** — Together / OpenAI / ElevenLabs interchangeable for every stage, one YAML
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## 🚀 Quick start
|
|
65
|
+
|
|
66
|
+
### Try it without installing anything
|
|
67
|
+
|
|
68
|
+
The live demo runs at **<https://violin-ai.com>** — drop a short clip in, get a dubbed video out in a few minutes.
|
|
69
|
+
|
|
70
|
+
### Run locally
|
|
71
|
+
|
|
72
|
+
Requires **Python 3.10+** and **ffmpeg** on PATH.
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
curl -LsSf https://astral.sh/uv/install.sh | sh # install uv if you don't have it
|
|
76
|
+
uv tool install violin # recommended — faster, isolated
|
|
77
|
+
# or: pip install violin # if you'd rather install into your current Python env
|
|
78
|
+
|
|
79
|
+
export TOGETHER_API_KEY=... # get one at https://api.together.ai (add to ~/.zshrc to persist)
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Three ways to use it:
|
|
83
|
+
|
|
84
|
+
**1. CLI** — translate one file:
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
violin lecture.mp4 lecture_zh.mp4 --language Chinese
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
**2. Web app** — full REST API + browser UI:
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
violin-api
|
|
94
|
+
# → http://127.0.0.1:8000 (browser UI)
|
|
95
|
+
# → http://127.0.0.1:8000/docs (interactive API docs)
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
**3. Claude Code skill** — invoke from any Claude Code session:
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
violin --install-skill # one-time: copies the skill into ~/.claude/skills/
|
|
102
|
+
claude
|
|
103
|
+
> please use the violin skill to translate path/to/video.mp4 into Chinese
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
<details><summary>Run from source (for hacking on the pipeline)</summary>
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
git clone https://github.com/shang-zhu/violin.git
|
|
110
|
+
cd violin
|
|
111
|
+
uv sync
|
|
112
|
+
cp .env.example .env # then fill in TOGETHER_API_KEY
|
|
113
|
+
uv run main.py lecture.mp4 lecture_zh.mp4 --language Chinese
|
|
114
|
+
```
|
|
115
|
+
</details>
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## 🎬 How Violin works
|
|
120
|
+
|
|
121
|
+
```
|
|
122
|
+
Video
|
|
123
|
+
│
|
|
124
|
+
├─ ffmpeg ─────────────────────► Extract audio (16 kHz WAV)
|
|
125
|
+
│
|
|
126
|
+
├─ Whisper Large v3 ────────────► Word-level timestamps → sentence segments
|
|
127
|
+
│
|
|
128
|
+
├─ LLM (DeepSeek V4 Pro by default) ──► Translate each segment, respecting style profile
|
|
129
|
+
│
|
|
130
|
+
├─ TTS (Cartesia Sonic 3 by default) ─► Synthesize dubbed audio per segment
|
|
131
|
+
│
|
|
132
|
+
└─ ffmpeg ─────────────────────► Speed-align video to dubbed audio,
|
|
133
|
+
concat with freeze-frame fallback,
|
|
134
|
+
single-pass AAC encode the audio track,
|
|
135
|
+
write output mp4 + optional SRT
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
## ⚙️ Configuration
|
|
141
|
+
|
|
142
|
+
Override any default by writing your own YAML and passing it with `--config my.yaml` — only the keys you want to change need to appear; values deep-merge with the [built-in defaults](https://github.com/shang-zhu/violin/blob/main/config/default.yaml).
|
|
143
|
+
|
|
144
|
+
### Switch providers
|
|
145
|
+
|
|
146
|
+
```yaml
|
|
147
|
+
# config/default.yaml — pick the stack you want
|
|
148
|
+
models:
|
|
149
|
+
transcription:
|
|
150
|
+
provider: together # together | openai
|
|
151
|
+
model: openai/whisper-large-v3 # together → openai/whisper-large-v3 | openai → whisper-1
|
|
152
|
+
translation:
|
|
153
|
+
provider: together # together | openai
|
|
154
|
+
model: deepseek-ai/DeepSeek-V4-Pro # together → deepseek-ai/DeepSeek-V4-Pro | openai → gpt-5.5
|
|
155
|
+
tts:
|
|
156
|
+
provider: together # together | elevenlabs | openai
|
|
157
|
+
model: cartesia/sonic-3 # together → cartesia/sonic-3 | elevenlabs → eleven_v3 | openai → tts-1-hd
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
### Production overrides
|
|
161
|
+
|
|
162
|
+
A starter `config/prod.yaml` is included for public deployments. It adds upload limits, serializes jobs, and caps ffmpeg concurrency. The included `Dockerfile` + `docker-compose.yml` + `Caddyfile` are how the live demo is hosted — `docker compose up -d --build` after filling `.env` is enough to put a copy of Violin behind auto-HTTPS on any Docker host.
|
|
163
|
+
|
|
164
|
+
### Environment variables
|
|
165
|
+
|
|
166
|
+
| Variable | When required | Description |
|
|
167
|
+
|----------|---------------|-------------|
|
|
168
|
+
| `TOGETHER_API_KEY` | **Recommended** — covers every stage with the default config | Together AI API key |
|
|
169
|
+
| `OPENAI_API_KEY` | Any stage uses `provider: openai` | Covers `whisper-1`, GPT models, and `tts-1` |
|
|
170
|
+
| `ELEVENLABS_API_KEY` | TTS uses `provider: elevenlabs` | ElevenLabs API key |
|
|
171
|
+
| `CORS_ORIGINS` | Optional | Comma-separated allowed origins (default: `*`) |
|
|
172
|
+
|
|
173
|
+
> You only need keys for the providers you actually pick. Pure-OpenAI deployments (all stages on `openai`) work too — `OPENAI_API_KEY` alone is enough. Same idea for ElevenLabs.
|
|
174
|
+
|
|
175
|
+
---
|
|
176
|
+
|
|
177
|
+
## 🎭 Style profiles
|
|
178
|
+
|
|
179
|
+
Six built-in profiles tune both the translation LLM prompt and the TTS delivery. Use `--style <name>` on the CLI or pass `style` in API requests.
|
|
180
|
+
|
|
181
|
+
| Style | Tone | TTS speed | Emotion |
|
|
182
|
+
|-------|------|-----------|---------|
|
|
183
|
+
| `standard` | Faithful translation, natural voice | 1.0× | — |
|
|
184
|
+
| `kids` | Rewritten for a 7-year-old, plain language | 1.0× | excited |
|
|
185
|
+
| `academic` | Formal register, preserves jargon and honorifics | 0.95× | calm |
|
|
186
|
+
| `casual` | Spoken slang, contractions, friendly | 1.1× | content |
|
|
187
|
+
| `storyteller` | Vivid, dramatic narration | 0.9× | enthusiastic |
|
|
188
|
+
| `news` | Concise, declarative, broadcast-style | 1.0× | neutral |
|
|
189
|
+
|
|
190
|
+
Add your own by editing `prompts/styles.yaml`.
|
|
191
|
+
|
|
192
|
+
See all available styles: `violin --style list`.
|
|
193
|
+
|
|
194
|
+
---
|
|
195
|
+
|
|
196
|
+
## 💻 CLI usage
|
|
197
|
+
|
|
198
|
+
> Examples use the PyPI-installed `violin` command. If you're running from a git checkout, substitute `uv run main.py` for `violin` (and `uv run run_api.py` for `violin-api`).
|
|
199
|
+
|
|
200
|
+
```bash
|
|
201
|
+
# Basic
|
|
202
|
+
violin lecture.mp4 lecture_es.mp4 --language Spanish
|
|
203
|
+
|
|
204
|
+
# Pick a style
|
|
205
|
+
violin talk.mp4 talk_zh.mp4 --language Chinese --style kids
|
|
206
|
+
|
|
207
|
+
# Pick a specific voice
|
|
208
|
+
violin lecture.mp4 lecture_fr.mp4 --language French --voice "french narrator man"
|
|
209
|
+
|
|
210
|
+
# Skip SRT
|
|
211
|
+
violin lecture.mp4 lecture_ja.mp4 --language Japanese --no-subtitles
|
|
212
|
+
|
|
213
|
+
# Full replacement (no original audio underneath)
|
|
214
|
+
violin lecture.mp4 lecture_ko.mp4 --language Korean --no-voiceover
|
|
215
|
+
|
|
216
|
+
# Custom config (e.g. switch to OpenAI/ElevenLabs)
|
|
217
|
+
violin lecture.mp4 lecture_it.mp4 --language Italian --config config/other_api.yaml
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
### CLI flags
|
|
221
|
+
|
|
222
|
+
| Flag | Default | Description |
|
|
223
|
+
|------|---------|-------------|
|
|
224
|
+
| `--language` / `-l` | *(required)* | Target language name (e.g. `Spanish`, `Japanese`) |
|
|
225
|
+
| `--voice` / `-v` | auto | TTS voice. Defaults to the primary native voice for the target language |
|
|
226
|
+
| `--source-language` | `auto-detect` | Source language hint for translation |
|
|
227
|
+
| `--no-subtitles` | off | Skip SRT generation |
|
|
228
|
+
| `--voiceover` / `--no-voiceover` | voiceover on | Keep original audio underneath the dub, or full replacement |
|
|
229
|
+
| `--style` / `-s` | `standard` | Style profile name. Use `--style list` to see all |
|
|
230
|
+
| `--config` / `-c` | `config/default.yaml` | Path to a YAML override file |
|
|
231
|
+
| `--timings-out` | off | Write per-step wall-clock timings + cost as JSON |
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## 🛰️ Web app & REST API
|
|
236
|
+
|
|
237
|
+
```bash
|
|
238
|
+
violin-api # default dev mode
|
|
239
|
+
violin-api --host 0.0.0.0 --port 8080 # bind everywhere
|
|
240
|
+
violin-api --config config/prod.yaml # production overrides (requires a git checkout for config/prod.yaml)
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
Core flow: `POST /jobs` to start, `GET /jobs/{id}` to poll, `GET /jobs/{id}/video` and `/srt` to download, `POST /jobs/{id}/chat` for in-video Q&A. Full list with request/response schemas at **`/docs`**.
|
|
244
|
+
|
|
245
|
+
### Example
|
|
246
|
+
|
|
247
|
+
```bash
|
|
248
|
+
# Submit
|
|
249
|
+
JOB=$(curl -s -X POST http://localhost:8000/jobs \
|
|
250
|
+
-F "file=@lecture.mp4" \
|
|
251
|
+
-F "language=Spanish" \
|
|
252
|
+
-F "style=academic" | jq -r .id)
|
|
253
|
+
|
|
254
|
+
# Poll
|
|
255
|
+
curl -s http://localhost:8000/jobs/$JOB | jq '{status, progress}'
|
|
256
|
+
|
|
257
|
+
# Download
|
|
258
|
+
curl -OJ http://localhost:8000/jobs/$JOB/video
|
|
259
|
+
curl -OJ http://localhost:8000/jobs/$JOB/srt
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
Job data lives under `jobs/{id}/`. Set `api.job_ttl_hours` to auto-delete jobs older than N hours (default `0` = disabled; `config/prod.yaml` uses 24h for the public demo).
|
|
263
|
+
|
|
264
|
+
---
|
|
265
|
+
|
|
266
|
+
## 🌍 Supported languages
|
|
267
|
+
|
|
268
|
+
Violin supports **33 target languages**. The 16 below ship with handpicked native-speaker voices for each provider; the rest fall back to the English voice catalog (which is multilingual under both Cartesia Sonic 3 and ElevenLabs `eleven_v3`).
|
|
269
|
+
|
|
270
|
+
Ordered by native-speaker population.
|
|
271
|
+
|
|
272
|
+
| Language | Cartesia native voice (M / F) | ElevenLabs native voice (M / F) |
|
|
273
|
+
|----------|-------------------------------|---------------------------------|
|
|
274
|
+
| Chinese | chinese commercial man / chinese female conversational | Lin / Lingyue |
|
|
275
|
+
| Spanish | spanish narrator man / spanish narrator lady | Carlos / Valeria |
|
|
276
|
+
| English | tutorial man / helpful woman | Adam / Sarah |
|
|
277
|
+
| Hindi | hindi narrator man / hindi narrator woman | Yatin / Madhusmita |
|
|
278
|
+
| Arabic | middle eastern woman | Faris / Haneen |
|
|
279
|
+
| Portuguese | friendly brazilian man / pleasant brazilian lady | Medeiros / Luna |
|
|
280
|
+
| Russian | russian narrator man 1 / russian narrator woman | Ivo / Xenia |
|
|
281
|
+
| Japanese | japanese male conversational / japanese woman conversational | Shohei / Maiko |
|
|
282
|
+
| Turkish | turkish narrator man / turkish calm man | Sinan / Aura |
|
|
283
|
+
| German | german reporter man / german conversational woman | Daniel / Sina |
|
|
284
|
+
| Korean | korean narrator man / korean calm woman | Joon-ho / Soo |
|
|
285
|
+
| French | french narrator man / french narrator lady | Lior / Virginie |
|
|
286
|
+
| Italian | italian narrator man / italian narrator woman | Raffaele / Chiara |
|
|
287
|
+
| Polish | polish confident man / polish narrator woman | Gregor / Jola |
|
|
288
|
+
| Dutch | dutch confident man / dutch man | Ronald / Jolanda |
|
|
289
|
+
| Swedish | swedish narrator man / swedish calm lady | Andreas / Louise |
|
|
290
|
+
|
|
291
|
+
The 17 fallback languages (using the English voice catalog), also ordered by native speakers: Vietnamese, Tamil, Indonesian, Malay, Ukrainian, Romanian, Thai, Greek, Hungarian, Catalan, Czech, Bulgarian, Danish, Slovak, Croatian, Finnish, Norwegian.
|
|
292
|
+
|
|
293
|
+
---
|
|
294
|
+
|
|
295
|
+
## 🤝 Contributing
|
|
296
|
+
|
|
297
|
+
PRs welcome. Got questions or hit a bug? Email **<heyviolinai@gmail.com>** or open an issue.
|
|
298
|
+
|
|
299
|
+
---
|
|
300
|
+
|
|
301
|
+
## ⚠️ Disclaimer
|
|
302
|
+
|
|
303
|
+
This is a personal open-source project, not a Together AI product. Users are responsible for ensuring they have the right to download and translate any content they process. Designed for Creative Commons, public domain, your own recordings, and other content you have permission to use.
|
|
304
|
+
|
|
305
|
+
---
|
|
306
|
+
|
|
307
|
+
## 📜 License
|
|
308
|
+
|
|
309
|
+
[MIT](https://github.com/shang-zhu/violin/blob/main/LICENSE) — use it freely, including commercially.
|
|
310
|
+
|
|
311
|
+
---
|
|
312
|
+
|
|
313
|
+
## 🙏 Acknowledgements
|
|
314
|
+
|
|
315
|
+
Built on top of [Together AI](https://together.ai), [Whisper](https://github.com/openai/whisper), [Cartesia Sonic 3](https://cartesia.ai), [ElevenLabs](https://elevenlabs.io), [FastAPI](https://fastapi.tiangolo.com/), and [ffmpeg](https://ffmpeg.org).
|