violin 0.1.0a1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (64) hide show
  1. violin-0.1.0a1/.claude/skills/video-translator/SKILL.md +67 -0
  2. violin-0.1.0a1/.dockerignore +17 -0
  3. violin-0.1.0a1/.env.example +21 -0
  4. violin-0.1.0a1/.gitignore +18 -0
  5. violin-0.1.0a1/.python-version +1 -0
  6. violin-0.1.0a1/Caddyfile +12 -0
  7. violin-0.1.0a1/Dockerfile +19 -0
  8. violin-0.1.0a1/LICENSE +21 -0
  9. violin-0.1.0a1/PKG-INFO +298 -0
  10. violin-0.1.0a1/README.md +262 -0
  11. violin-0.1.0a1/api/__init__.py +0 -0
  12. violin-0.1.0a1/api/app.py +94 -0
  13. violin-0.1.0a1/api/config.py +15 -0
  14. violin-0.1.0a1/api/models.py +87 -0
  15. violin-0.1.0a1/api/routes/__init__.py +0 -0
  16. violin-0.1.0a1/api/routes/catalog.py +190 -0
  17. violin-0.1.0a1/api/routes/chat.py +39 -0
  18. violin-0.1.0a1/api/routes/files.py +133 -0
  19. violin-0.1.0a1/api/routes/jobs.py +228 -0
  20. violin-0.1.0a1/api/static/demo/hassan_gpt_oss_en.mp4 +0 -0
  21. violin-0.1.0a1/api/static/demo/hassan_gpt_oss_ko.mp4 +0 -0
  22. violin-0.1.0a1/api/static/demo/percy_en.mp4 +0 -0
  23. violin-0.1.0a1/api/static/demo/percy_zh.mp4 +0 -0
  24. violin-0.1.0a1/api/static/demo/posters/dario_interview_it.jpg +0 -0
  25. violin-0.1.0a1/api/static/demo/posters/hassan_gpt_oss_ko.jpg +0 -0
  26. violin-0.1.0a1/api/static/demo/posters/percy_zh.jpg +0 -0
  27. violin-0.1.0a1/api/static/index.html +1631 -0
  28. violin-0.1.0a1/api/stats.py +141 -0
  29. violin-0.1.0a1/api/storage.py +241 -0
  30. violin-0.1.0a1/api/usage.py +61 -0
  31. violin-0.1.0a1/api/video_chat.py +185 -0
  32. violin-0.1.0a1/api/worker.py +237 -0
  33. violin-0.1.0a1/assets/demo_en.mp4 +0 -0
  34. violin-0.1.0a1/assets/outcome.png +0 -0
  35. violin-0.1.0a1/config/default.yaml +88 -0
  36. violin-0.1.0a1/config/other_api.yaml +18 -0
  37. violin-0.1.0a1/config/prod.yaml +21 -0
  38. violin-0.1.0a1/docker-compose.yml +26 -0
  39. violin-0.1.0a1/main.py +194 -0
  40. violin-0.1.0a1/pipeline/__init__.py +0 -0
  41. violin-0.1.0a1/pipeline/config.py +49 -0
  42. violin-0.1.0a1/pipeline/costs.py +123 -0
  43. violin-0.1.0a1/pipeline/extractor.py +67 -0
  44. violin-0.1.0a1/pipeline/ffmpeg_utils.py +47 -0
  45. violin-0.1.0a1/pipeline/languages.py +48 -0
  46. violin-0.1.0a1/pipeline/llm_client.py +212 -0
  47. violin-0.1.0a1/pipeline/merger.py +642 -0
  48. violin-0.1.0a1/pipeline/orchestrator.py +219 -0
  49. violin-0.1.0a1/pipeline/pricing.py +47 -0
  50. violin-0.1.0a1/pipeline/styles.py +60 -0
  51. violin-0.1.0a1/pipeline/transcriber.py +451 -0
  52. violin-0.1.0a1/pipeline/translator.py +281 -0
  53. violin-0.1.0a1/pipeline/tts.py +122 -0
  54. violin-0.1.0a1/pipeline/tts_elevenlabs.py +283 -0
  55. violin-0.1.0a1/pipeline/tts_openai.py +141 -0
  56. violin-0.1.0a1/pipeline/tts_together.py +150 -0
  57. violin-0.1.0a1/prompts/__init__.py +34 -0
  58. violin-0.1.0a1/prompts/styles.yaml +118 -0
  59. violin-0.1.0a1/prompts/translate.yaml +106 -0
  60. violin-0.1.0a1/prompts/video_chat.yaml +14 -0
  61. violin-0.1.0a1/prompts/voice_match.yaml +16 -0
  62. violin-0.1.0a1/pyproject.toml +62 -0
  63. violin-0.1.0a1/run_api.py +65 -0
  64. violin-0.1.0a1/uv.lock +940 -0
@@ -0,0 +1,67 @@
1
+ ---
2
+ name: video-translator
3
+ description: Dub a video into another language and generate subtitles using the default Together + Cartesia stack. Trigger when the user wants to translate / dub / voice-over a video file, or generate subtitles for it. Handles `.mp4` / `.mkv` / `.webm`. Installs as the `violin` CLI (and `violin-api` for the FastAPI server) via `uv tool install`. For premium models (OpenAI / ElevenLabs) or custom configs, point the user to the repo.
4
+ allowed-tools: Bash, Read
5
+ ---
6
+
7
+ # Violin — operating skill
8
+
9
+ Always uses the default config (Together for translation, `cartesia/sonic-3` for TTS). If the user asks for OpenAI, ElevenLabs, or custom configs, **stop and point them to the Violin repo** — those flows aren't supported through the global CLI.
10
+
11
+ ## Pre-flight
12
+
13
+ Run these silently first. Abort if any fails:
14
+
15
+ ```bash
16
+ command -v violin # 1. CLI on PATH
17
+ test -f "<input>" # 2. Input exists
18
+ printenv TOGETHER_API_KEY # 3. Key available
19
+ ```
20
+
21
+ If `violin` is missing: tell the user to `uv tool install .` from a Violin checkout. Do not auto-install.
22
+
23
+ If `TOGETHER_API_KEY` is missing:
24
+ - Inside the Violin repo → populate `.env` (auto-loaded)
25
+ - Elsewhere → `export TOGETHER_API_KEY=...` in `~/.zshrc` / `~/.bashrc`, then `source` it
26
+
27
+ ## Decisions
28
+
29
+ - **CLI vs API**: single run-and-wait file → CLI (`violin`). Multi-job / HTTP / web UI → API server (`violin-api`); print the command, don't auto-start it.
30
+ - **Style** (`--style`): default `standard`. Kids content → `kids`, formal/lecture → `academic`, casual → `casual`, dramatic → `storyteller`, news → `news`. Run `violin --style list` if unsure.
31
+ - **Voiceover**: keep default (mix dubbed audio over a quiet original). Use `--no-voiceover` only when the user explicitly says "replace audio entirely".
32
+
33
+ ## Run
34
+
35
+ ```bash
36
+ violin <input> <output> --language <Lang> [flags]
37
+ ```
38
+
39
+ ## Flags
40
+
41
+ | Flag | Default | When to set |
42
+ |------|---------|-------------|
43
+ | `--language` / `-l` | *required* | Target language (e.g. `Chinese`, `Spanish`, `Japanese`). |
44
+ | `--voice` / `-v` | auto (native voice picked by `preferences.voice_gender`) | Only when the user names a specific voice from the catalog (e.g. `"warm female narrator"`). Otherwise omit and let the default kick in. |
45
+ | `--source-language` | `auto-detect` | Only if Whisper mis-detects the source language. |
46
+ | `--style` / `-s` | `standard` | See Decisions above. |
47
+ | `--no-subtitles` | off | User says "no SRT" / "video only". |
48
+ | `--no-voiceover` | off | User says "replace original audio entirely". |
49
+ | `--config` / `-c` | `config/default.yaml` | Don't use through this skill — repo-only flow. |
50
+ | `--timings-out` | off | Only when the user wants a per-step timing JSON for debugging / benchmarking. |
51
+
52
+ ## Language coverage
53
+
54
+ 33 target languages total. **16** ship with handpicked native-speaker voices: Chinese, Spanish, English, Hindi, Arabic, Portuguese, Russian, Japanese, Turkish, German, Korean, French, Italian, Polish, Dutch, Swedish. The other **17** fall back to the English voice catalog (multilingual under Cartesia Sonic 3) — quality is decent but the voice isn't a native speaker. Mention this caveat only if the user is translating to a fallback language and asks about voice quality.
55
+
56
+ ## Report back
57
+
58
+ - Output video path + SRT path (printed by the run).
59
+ - Total cost (printed at end — surface, don't hide).
60
+ - If voiceover was on, mention the `_original.m4a` sidecar.
61
+
62
+ ## Don'ts
63
+
64
+ - Don't run on multi-GB videos without first quoting the rough cost (audio length × per-provider rates in `pipeline/pricing.py`).
65
+ - Don't fabricate a "subtitles-only" mode — the CLI requires the full pipeline. If the user only wants SRT, run the full pipeline and hand them just the `.srt`, warning them of the cost first.
66
+ - Don't try to switch to OpenAI or ElevenLabs from this skill. Point the user to the repo + `--config config/other_api.yaml` (or their own override).
67
+ - Don't paraphrase the README. For supported languages (33), voice catalog, and full flag docs, point them at `README.md` or `violin --help`.
@@ -0,0 +1,17 @@
1
+ .git
2
+ .gitignore
3
+ .env
4
+ .env.*
5
+ !.env.example
6
+ jobs/
7
+ __pycache__
8
+ *.pyc
9
+ *.pyo
10
+ .venv
11
+ .mypy_cache
12
+ .ruff_cache
13
+ *.egg-info
14
+ dist/
15
+ build/
16
+ *.md
17
+ !README.md
@@ -0,0 +1,21 @@
1
+ # ── API keys ─────────────────────────────────────────────────
2
+ # Only fill in the keys for the providers you actually use — see
3
+ # config/default.yaml `models:` section. The default Together stack
4
+ # (Whisper + LLM + Cartesia TTS) needs only TOGETHER_API_KEY.
5
+
6
+ # Together AI — https://api.together.ai
7
+ TOGETHER_API_KEY=
8
+
9
+ # OpenAI — needed when any stage uses `provider: openai`
10
+ # (whisper-1, GPT models, tts-1, vision chat). https://platform.openai.com
11
+ OPENAI_API_KEY=
12
+
13
+ # ElevenLabs — needed when TTS uses `provider: elevenlabs`
14
+ # https://elevenlabs.io/app/settings/api-keys
15
+ ELEVENLABS_API_KEY=
16
+
17
+
18
+ # ── Optional ─────────────────────────────────────────────────
19
+
20
+ # CORS allow-list for the web app (default: * — any origin)
21
+ # CORS_ORIGINS=https://yourdomain.com
@@ -0,0 +1,18 @@
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ dist/
6
+ wheels/
7
+ *.egg-info
8
+
9
+ # Virtual environments
10
+ .venv
11
+ .env
12
+ examples/
13
+
14
+ #logs
15
+ *.log
16
+ jobs/
17
+
18
+ .DS_Store
@@ -0,0 +1 @@
1
+ 3.13
@@ -0,0 +1,12 @@
1
+ # Replace with your actual domain name.
2
+ # Caddy will automatically provision a Let's Encrypt TLS certificate
3
+ # once the domain's DNS A record points to this server's public IP.
4
+ #
5
+ # For local testing without a domain, replace with:
6
+ # :80 {
7
+ # reverse_proxy violin:8000
8
+ # }
9
+
10
+ violin-ai.com {
11
+ reverse_proxy violin:8000
12
+ }
@@ -0,0 +1,19 @@
1
+ FROM python:3.13-slim AS base
2
+
3
+ RUN apt-get update && \
4
+ apt-get install -y --no-install-recommends libsndfile1 ffmpeg && \
5
+ rm -rf /var/lib/apt/lists/*
6
+
7
+ COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
8
+
9
+ WORKDIR /app
10
+
11
+ COPY pyproject.toml uv.lock ./
12
+ RUN uv sync --frozen --no-dev --no-install-project
13
+
14
+ COPY . .
15
+ RUN uv sync --frozen --no-dev
16
+
17
+ EXPOSE 8000
18
+
19
+ CMD ["uv", "run", "run_api.py", "--host", "0.0.0.0", "--port", "8000", "--config", "config/prod.yaml"]
violin-0.1.0a1/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Violin contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,298 @@
1
+ Metadata-Version: 2.4
2
+ Name: violin
3
+ Version: 0.1.0a1
4
+ Summary: Open-source video dubbing — translate any video into 33 languages with native-sounding voice-over and synced subtitles.
5
+ Project-URL: Repository, https://github.com/shang-zhu/Violin
6
+ Project-URL: Issues, https://github.com/shang-zhu/Violin/issues
7
+ Author: Shang Zhu, Qinghong Lin
8
+ License: MIT
9
+ License-File: LICENSE
10
+ Keywords: dubbing,elevenlabs,subtitles,together-ai,translation,tts,video,whisper
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Intended Audience :: End Users/Desktop
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3.13
17
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
18
+ Classifier: Topic :: Multimedia :: Video
19
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
20
+ Requires-Python: >=3.13
21
+ Requires-Dist: aiofiles>=25.1.0
22
+ Requires-Dist: elevenlabs>=1.0.0
23
+ Requires-Dist: fastapi>=0.135.2
24
+ Requires-Dist: ffmpeg-python>=0.2.0
25
+ Requires-Dist: httpx>=0.28.1
26
+ Requires-Dist: imageio-ffmpeg>=0.6.0
27
+ Requires-Dist: openai>=1.0.0
28
+ Requires-Dist: python-dotenv>=1.2.2
29
+ Requires-Dist: python-multipart>=0.0.22
30
+ Requires-Dist: pyyaml>=6.0.3
31
+ Requires-Dist: soundfile>=0.13.1
32
+ Requires-Dist: together>=2.5.0
33
+ Requires-Dist: uvicorn>=0.42.0
34
+ Requires-Dist: yt-dlp>=2025.1.0
35
+ Description-Content-Type: text/markdown
36
+
37
+ # 🎻 Violin
38
+
39
+ **Open-source video dubbing — translate any video into 33 languages with natural-sounding voice-over and synced subtitles.**
40
+
41
+ [🌐 Live demo](https://violin-ai.com) · [📜 MIT License](LICENSE)
42
+
43
+ <!-- ![demo](assets/outcome.png) -->
44
+
45
+ Upload a video. Violin transcribes the speech, translates it, synthesizes a native-sounding voice-over in the target language, and remuxes it back into the video — fully aligned, with optional SRT subtitles.
46
+
47
+ Available as a **CLI**, a **FastAPI web app**, and a **Claude Code skill**.
48
+
49
+ ---
50
+
51
+ ## ✨ Features
52
+
53
+ - **33 target languages** with handpicked native-speaker voices for the 16 most-used ones (Cartesia Sonic 3 + ElevenLabs)
54
+ - **In-video Q&A** — ask questions about any moment in the dubbed video; answers use nearby subtitles plus sampled frames
55
+ - **Natural-language voice picker** — describe the voice you want, an LLM picks from the catalog
56
+ - **6 style profiles** *(experimental)* — standard / kids / academic / casual / storyteller / news
57
+ - **Pluggable stack** — Together / OpenAI / ElevenLabs interchangeable for every stage, one YAML
58
+
59
+ ---
60
+
61
+ ## 🚀 Quick start
62
+
63
+ ### Try it without installing anything
64
+
65
+ The live demo runs at **<https://violin-ai.com>** — drop a short clip in, get a dubbed video out in a few minutes.
66
+
67
+ ### Run locally
68
+
69
+ Requires **Python 3.13+** and **ffmpeg** on PATH.
70
+
71
+ ```bash
72
+ git clone https://github.com/shang-zhu/violin.git
73
+ cd violin
74
+ uv sync
75
+ cp .env.example .env # then fill in TOGETHER_API_KEY (get one at https://api.together.ai)
76
+ ```
77
+
78
+ Three ways to use it:
79
+
80
+ **1. CLI** — translate one file:
81
+
82
+ ```bash
83
+ uv run main.py assets/demo_en.mp4.mp4 assets/demo_en_zh.mp4 --language Chinese
84
+ ```
85
+
86
+ **2. Web app** — full REST API + browser UI:
87
+
88
+ ```bash
89
+ uv run run_api.py
90
+ # → http://127.0.0.1:8000 (browser UI)
91
+ # → http://127.0.0.1:8000/docs (interactive API docs)
92
+ ```
93
+
94
+ **3. Claude Code skill** — invoke from any Claude Code session:
95
+
96
+ ```bash
97
+ cp -r .claude/skills/video-translator ~/.claude/skills/
98
+ claude
99
+ > please use the violin skill to translate assets/demo_en.mp4 into Chinese
100
+ ```
101
+
102
+ ---
103
+
104
+ ## 🎬 How Violin works
105
+
106
+ ```
107
+ Video
108
+
109
+ ├─ ffmpeg ─────────────────────► Extract audio (16 kHz WAV)
110
+
111
+ ├─ Whisper Large v3 ────────────► Word-level timestamps → sentence segments
112
+
113
+ ├─ LLM (DeepSeek V4 Pro by default) ──► Translate each segment, respecting style profile
114
+
115
+ ├─ TTS (Cartesia Sonic 3 by default) ─► Synthesize dubbed audio per segment
116
+
117
+ └─ ffmpeg ─────────────────────► Speed-align video to dubbed audio,
118
+ concat with freeze-frame fallback,
119
+ single-pass AAC encode the audio track,
120
+ write output mp4 + optional SRT
121
+ ```
122
+
123
+ Key engineering decisions worth a look if you're forking:
124
+
125
+ - **`pipeline/transcriber.py`** — uses Whisper's word-level timestamps to split into precise sentence boundaries. Has a hallucination filter that re-uses Whisper's own `no_speech_prob` segment metadata (no hand-tuned heuristics).
126
+ - **`pipeline/merger.py`** — concatenates speed-adjusted video chunks, but builds the audio track *once* from concatenated PCM and encodes AAC at the end. This is the difference between "subtitles drift 1–2 s by the end of an 8 min video" and "perfectly synced throughout."
127
+ - **`pipeline/tts_*.py`** — Cartesia + ElevenLabs backends share an interface. ElevenLabs side ships 21 premade voices (multilingual via `eleven_v3`) plus 15 hand-picked native-speaker voices from the Voice Library.
128
+
129
+ ---
130
+
131
+ ## ⚙️ Configuration
132
+
133
+ All defaults live in `config/default.yaml`. Override with `--config my.yaml` (only the keys you want to change need to appear in the override file — values deep-merge).
134
+
135
+ ### Switch providers
136
+
137
+ ```yaml
138
+ # config/default.yaml — pick the stack you want
139
+ models:
140
+ transcription:
141
+ provider: together # together | openai
142
+ model: openai/whisper-large-v3 # together → openai/whisper-large-v3 | openai → whisper-1
143
+ translation:
144
+ provider: together # together | openai
145
+ model: deepseek-ai/DeepSeek-V4-Pro # together → deepseek-ai/DeepSeek-V4-Pro | openai → gpt-5.5
146
+ tts:
147
+ provider: together # together | elevenlabs | openai
148
+ model: cartesia/sonic-3 # together → cartesia/sonic-3 | elevenlabs → eleven_v3 | openai → tts-1-hd
149
+ ```
150
+
151
+ ### Production overrides
152
+
153
+ A starter `config/prod.yaml` is included for public deployments. It adds upload limits, serializes jobs, and caps ffmpeg concurrency. The included `Dockerfile` + `docker-compose.yml` + `Caddyfile` are how the live demo is hosted — `docker compose up -d --build` after filling `.env` is enough to put a copy of Violin behind auto-HTTPS on any Docker host.
154
+
155
+ ### Environment variables
156
+
157
+ | Variable | When required | Description |
158
+ |----------|---------------|-------------|
159
+ | `TOGETHER_API_KEY` | **Recommended** — covers every stage with the default config | Together AI API key |
160
+ | `OPENAI_API_KEY` | Any stage uses `provider: openai` | Covers `whisper-1`, GPT models, and `tts-1` |
161
+ | `ELEVENLABS_API_KEY` | TTS uses `provider: elevenlabs` | ElevenLabs API key |
162
+ | `CORS_ORIGINS` | Optional | Comma-separated allowed origins (default: `*`) |
163
+
164
+ > You only need keys for the providers you actually pick. Pure-OpenAI deployments (all stages on `openai`) work too — `OPENAI_API_KEY` alone is enough. Same idea for ElevenLabs.
165
+
166
+ ---
167
+
168
+ ## 🎭 Style profiles
169
+
170
+ Six built-in profiles tune both the translation LLM prompt and the TTS delivery. Use `--style <name>` on the CLI or pass `style` in API requests.
171
+
172
+ | Style | Tone | TTS speed | Emotion |
173
+ |-------|------|-----------|---------|
174
+ | `standard` | Faithful translation, natural voice | 1.0× | — |
175
+ | `kids` | Rewritten for a 7-year-old, plain language | 1.0× | excited |
176
+ | `academic` | Formal register, preserves jargon and honorifics | 0.95× | calm |
177
+ | `casual` | Spoken slang, contractions, friendly | 1.1× | content |
178
+ | `storyteller` | Vivid, dramatic narration | 0.9× | enthusiastic |
179
+ | `news` | Concise, declarative, broadcast-style | 1.0× | neutral |
180
+
181
+ Add your own by editing `prompts/styles.yaml`.
182
+
183
+ See all available styles: `uv run main.py --style list`.
184
+
185
+ ---
186
+
187
+ ## 💻 CLI usage
188
+
189
+ ```bash
190
+ # Basic
191
+ uv run main.py lecture.mp4 lecture_es.mp4 --language Spanish
192
+
193
+ # Pick a style
194
+ uv run main.py talk.mp4 talk_zh.mp4 --language Chinese --style kids
195
+
196
+ # Pick a specific voice
197
+ uv run main.py lecture.mp4 lecture_fr.mp4 --language French --voice "french narrator man"
198
+
199
+ # Skip SRT
200
+ uv run main.py lecture.mp4 lecture_ja.mp4 --language Japanese --no-subtitles
201
+
202
+ # Full replacement (no original audio underneath)
203
+ uv run main.py lecture.mp4 lecture_ko.mp4 --language Korean --no-voiceover
204
+
205
+ # Custom config (e.g. switch to OpenAI/ElevenLabs)
206
+ uv run main.py lecture.mp4 lecture_it.mp4 --language Italian --config config/other_api.yaml
207
+ ```
208
+
209
+ ### CLI flags
210
+
211
+ | Flag | Default | Description |
212
+ |------|---------|-------------|
213
+ | `--language` / `-l` | *(required)* | Target language name (e.g. `Spanish`, `Japanese`) |
214
+ | `--voice` / `-v` | auto | TTS voice. Defaults to the primary native voice for the target language |
215
+ | `--source-language` | `auto-detect` | Source language hint for translation |
216
+ | `--no-subtitles` | off | Skip SRT generation |
217
+ | `--voiceover` / `--no-voiceover` | voiceover on | Keep original audio underneath the dub, or full replacement |
218
+ | `--style` / `-s` | `standard` | Style profile name. Use `--style list` to see all |
219
+ | `--config` / `-c` | `config/default.yaml` | Path to a YAML override file |
220
+ | `--timings-out` | off | Write per-step wall-clock timings + cost as JSON |
221
+
222
+ ---
223
+
224
+ ## 🛰️ Web app & REST API
225
+
226
+ ```bash
227
+ uv run run_api.py # default dev mode
228
+ uv run run_api.py --host 0.0.0.0 --port 8080 # bind everywhere
229
+ uv run run_api.py --config config/prod.yaml # production overrides
230
+ ```
231
+
232
+ Core flow: `POST /jobs` to start, `GET /jobs/{id}` to poll, `GET /jobs/{id}/video` and `/srt` to download, `POST /jobs/{id}/chat` for in-video Q&A. Full list with request/response schemas at **`/docs`**.
233
+
234
+ ### Example
235
+
236
+ ```bash
237
+ # Submit
238
+ JOB=$(curl -s -X POST http://localhost:8000/jobs \
239
+ -F "file=@lecture.mp4" \
240
+ -F "language=Spanish" \
241
+ -F "style=academic" | jq -r .id)
242
+
243
+ # Poll
244
+ curl -s http://localhost:8000/jobs/$JOB | jq '{status, progress}'
245
+
246
+ # Download
247
+ curl -OJ http://localhost:8000/jobs/$JOB/video
248
+ curl -OJ http://localhost:8000/jobs/$JOB/srt
249
+ ```
250
+
251
+ Job data lives under `jobs/{id}/`. Set `api.job_ttl_hours` to auto-delete jobs older than N hours (default `0` = disabled; `config/prod.yaml` uses 24h for the public demo).
252
+
253
+ ---
254
+
255
+ ## 🌍 Supported languages
256
+
257
+ Violin supports **33 target languages**. The 16 below ship with handpicked native-speaker voices for each provider; the rest fall back to the English voice catalog (which is multilingual under both Cartesia Sonic 3 and ElevenLabs `eleven_v3`).
258
+
259
+ Ordered by native-speaker population.
260
+
261
+ | Language | Cartesia native voice (M / F) | ElevenLabs native voice (M / F) |
262
+ |----------|-------------------------------|---------------------------------|
263
+ | Chinese | chinese commercial man / chinese female conversational | Lin / Lingyue |
264
+ | Spanish | spanish narrator man / spanish narrator lady | Carlos / Valeria |
265
+ | English | tutorial man / helpful woman | Adam / Sarah |
266
+ | Hindi | hindi narrator man / hindi narrator woman | Yatin / Madhusmita |
267
+ | Arabic | middle eastern woman | Faris / Haneen |
268
+ | Portuguese | friendly brazilian man / pleasant brazilian lady | Medeiros / Luna |
269
+ | Russian | russian narrator man 1 / russian narrator woman | Ivo / Xenia |
270
+ | Japanese | japanese male conversational / japanese woman conversational | Shohei / Maiko |
271
+ | Turkish | turkish narrator man / turkish calm man | Sinan / Aura |
272
+ | German | german reporter man / german conversational woman | Daniel / Sina |
273
+ | Korean | korean narrator man / korean calm woman | Joon-ho / Soo |
274
+ | French | french narrator man / french narrator lady | Lior / Virginie |
275
+ | Italian | italian narrator man / italian narrator woman | Raffaele / Chiara |
276
+ | Polish | polish confident man / polish narrator woman | Gregor / Jola |
277
+ | Dutch | dutch confident man / dutch man | Ronald / Jolanda |
278
+ | Swedish | swedish narrator man / swedish calm lady | Andreas / Louise |
279
+
280
+ The 17 fallback languages (using the English voice catalog), also ordered by native speakers: Vietnamese, Tamil, Indonesian, Malay, Ukrainian, Romanian, Thai, Greek, Hungarian, Catalan, Czech, Bulgarian, Danish, Slovak, Croatian, Finnish, Norwegian.
281
+
282
+ ---
283
+
284
+ ## 🤝 Contributing
285
+
286
+ PRs welcome. Got questions or hit a bug? Email **<heyviolinai@gmail.com>** or open an issue.
287
+
288
+ ---
289
+
290
+ ## 📜 License
291
+
292
+ [MIT](LICENSE) — use it freely, including commercially.
293
+
294
+ ---
295
+
296
+ ## 🙏 Acknowledgements
297
+
298
+ Built on top of [Together AI](https://together.ai), [Whisper](https://github.com/openai/whisper), [Cartesia Sonic 3](https://cartesia.ai), [ElevenLabs](https://elevenlabs.io), [FastAPI](https://fastapi.tiangolo.com/), and [ffmpeg](https://ffmpeg.org).