violin 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (64) hide show
  1. violin-0.1.0/.claude/skills/video-translator/SKILL.md +67 -0
  2. violin-0.1.0/.dockerignore +17 -0
  3. violin-0.1.0/.env.example +21 -0
  4. violin-0.1.0/.gitignore +18 -0
  5. violin-0.1.0/.python-version +1 -0
  6. violin-0.1.0/Caddyfile +12 -0
  7. violin-0.1.0/Dockerfile +19 -0
  8. violin-0.1.0/LICENSE +21 -0
  9. violin-0.1.0/PKG-INFO +315 -0
  10. violin-0.1.0/README.md +276 -0
  11. violin-0.1.0/api/__init__.py +0 -0
  12. violin-0.1.0/api/app.py +109 -0
  13. violin-0.1.0/api/config.py +15 -0
  14. violin-0.1.0/api/models.py +87 -0
  15. violin-0.1.0/api/routes/__init__.py +0 -0
  16. violin-0.1.0/api/routes/catalog.py +190 -0
  17. violin-0.1.0/api/routes/chat.py +39 -0
  18. violin-0.1.0/api/routes/files.py +133 -0
  19. violin-0.1.0/api/routes/jobs.py +228 -0
  20. violin-0.1.0/api/static/demo/hassan_gpt_oss_en.mp4 +0 -0
  21. violin-0.1.0/api/static/demo/hassan_gpt_oss_ko.mp4 +0 -0
  22. violin-0.1.0/api/static/demo/percy_en.mp4 +0 -0
  23. violin-0.1.0/api/static/demo/percy_zh.mp4 +0 -0
  24. violin-0.1.0/api/static/demo/posters/dario_interview_it.jpg +0 -0
  25. violin-0.1.0/api/static/demo/posters/hassan_gpt_oss_ko.jpg +0 -0
  26. violin-0.1.0/api/static/demo/posters/percy_zh.jpg +0 -0
  27. violin-0.1.0/api/static/index.html +1644 -0
  28. violin-0.1.0/api/stats.py +141 -0
  29. violin-0.1.0/api/storage.py +241 -0
  30. violin-0.1.0/api/usage.py +61 -0
  31. violin-0.1.0/api/video_chat.py +185 -0
  32. violin-0.1.0/api/worker.py +237 -0
  33. violin-0.1.0/assets/demo_en.mp4 +0 -0
  34. violin-0.1.0/assets/outcome.png +0 -0
  35. violin-0.1.0/config/default.yaml +88 -0
  36. violin-0.1.0/config/other_api.yaml +18 -0
  37. violin-0.1.0/config/prod.yaml +21 -0
  38. violin-0.1.0/docker-compose.yml +26 -0
  39. violin-0.1.0/main.py +219 -0
  40. violin-0.1.0/pipeline/__init__.py +0 -0
  41. violin-0.1.0/pipeline/config.py +49 -0
  42. violin-0.1.0/pipeline/costs.py +123 -0
  43. violin-0.1.0/pipeline/extractor.py +67 -0
  44. violin-0.1.0/pipeline/ffmpeg_utils.py +47 -0
  45. violin-0.1.0/pipeline/languages.py +48 -0
  46. violin-0.1.0/pipeline/llm_client.py +212 -0
  47. violin-0.1.0/pipeline/merger.py +642 -0
  48. violin-0.1.0/pipeline/orchestrator.py +219 -0
  49. violin-0.1.0/pipeline/pricing.py +47 -0
  50. violin-0.1.0/pipeline/styles.py +60 -0
  51. violin-0.1.0/pipeline/transcriber.py +451 -0
  52. violin-0.1.0/pipeline/translator.py +281 -0
  53. violin-0.1.0/pipeline/tts.py +122 -0
  54. violin-0.1.0/pipeline/tts_elevenlabs.py +283 -0
  55. violin-0.1.0/pipeline/tts_openai.py +141 -0
  56. violin-0.1.0/pipeline/tts_together.py +150 -0
  57. violin-0.1.0/prompts/__init__.py +34 -0
  58. violin-0.1.0/prompts/styles.yaml +118 -0
  59. violin-0.1.0/prompts/translate.yaml +106 -0
  60. violin-0.1.0/prompts/video_chat.yaml +14 -0
  61. violin-0.1.0/prompts/voice_match.yaml +16 -0
  62. violin-0.1.0/pyproject.toml +66 -0
  63. violin-0.1.0/run_api.py +65 -0
  64. violin-0.1.0/uv.lock +1404 -0
@@ -0,0 +1,67 @@
1
+ ---
2
+ name: video-translator
3
+ description: Dub a video into another language and generate subtitles using the default Together + Cartesia stack. Trigger when the user wants to translate / dub / voice-over a video file, or generate subtitles for it. Handles `.mp4` / `.mkv` / `.webm`. Installs as the `violin` CLI (and `violin-api` for the FastAPI server) via `uv tool install`. For alternative models (OpenAI / ElevenLabs) or custom configs, point the user to the repo: https://github.com/shang-zhu/violin.
4
+ allowed-tools: Bash, Read
5
+ ---
6
+
7
+ # Violin — operating skill
8
+
9
+ Always uses the default config (Together for translation, `cartesia/sonic-3` for TTS). If the user asks for OpenAI, ElevenLabs, or custom configs, **stop and point them to the Violin repo** — those flows aren't supported through the global CLI.
10
+
11
+ ## Pre-flight
12
+
13
+ Run these silently first. Abort if any fails:
14
+
15
+ ```bash
16
+ command -v violin # 1. CLI on PATH
17
+ test -f "<input>" # 2. Input exists
18
+ printenv TOGETHER_API_KEY # 3. Key available
19
+ ```
20
+
21
+ If `violin` is missing: tell the user to `uv tool install violin`, then `violin --install-skill` to refresh this skill file. Do not auto-install.
22
+
23
+ If `TOGETHER_API_KEY` is missing:
24
+ - Inside the Violin repo → populate `.env` (auto-loaded)
25
+ - Elsewhere → `export TOGETHER_API_KEY=...` in `~/.zshrc` / `~/.bashrc`, then `source` it
26
+
27
+ ## Decisions
28
+
29
+ - **CLI vs API**: single run-and-wait file → CLI (`violin`). Multi-job / HTTP / web UI → API server (`violin-api`); print the command, don't auto-start it.
30
+ - **Style** (`--style`): default `standard`. Kids content → `kids`, formal/lecture → `academic`, casual → `casual`, dramatic → `storyteller`, news → `news`. Run `violin --style list` if unsure.
31
+ - **Voiceover**: keep default (mix dubbed audio over a quiet original). Use `--no-voiceover` only when the user explicitly says "replace audio entirely".
32
+
33
+ ## Run
34
+
35
+ ```bash
36
+ violin <input> <output> --language <Lang> [flags]
37
+ ```
38
+
39
+ ## Flags
40
+
41
+ | Flag | Default | When to set |
42
+ |------|---------|-------------|
43
+ | `--language` / `-l` | *required* | Target language (e.g. `Chinese`, `Spanish`, `Japanese`). |
44
+ | `--voice` / `-v` | auto (native voice picked by `preferences.voice_gender`) | Only when the user names a specific voice from the catalog (e.g. `"warm female narrator"`). Otherwise omit and let the default kick in. |
45
+ | `--source-language` | `auto-detect` | Only if Whisper mis-detects the source language. |
46
+ | `--style` / `-s` | `standard` | See Decisions above. |
47
+ | `--no-subtitles` | off | User says "no SRT" / "video only". |
48
+ | `--no-voiceover` | off | User says "replace original audio entirely". |
49
+ | `--config` / `-c` | `config/default.yaml` | Don't use through this skill — repo-only flow. |
50
+ | `--timings-out` | off | Only when the user wants a per-step timing JSON for debugging / benchmarking. |
51
+
52
+ ## Language coverage
53
+
54
+ 33 target languages total. **16** ship with handpicked native-speaker voices: Chinese, Spanish, English, Hindi, Arabic, Portuguese, Russian, Japanese, Turkish, German, Korean, French, Italian, Polish, Dutch, Swedish. The other **17** fall back to the English voice catalog (multilingual under Cartesia Sonic 3) — quality is decent but the voice isn't a native speaker. Mention this caveat only if the user is translating to a fallback language and asks about voice quality.
55
+
56
+ ## Report back
57
+
58
+ - Output video path + SRT path (printed by the run).
59
+ - Total cost (printed at end — surface, don't hide).
60
+ - If voiceover was on, mention the `_original.m4a` sidecar.
61
+
62
+ ## Don'ts
63
+
64
+ - Don't run on multi-GB videos without first quoting the rough cost (audio length × per-provider rates in `pipeline/pricing.py`).
65
+ - Don't fabricate a "subtitles-only" mode — the CLI requires the full pipeline. If the user only wants SRT, run the full pipeline and hand them just the `.srt`, warning them of the cost first.
66
+ - Don't try to switch to OpenAI or ElevenLabs from this skill. Point the user to the repo + `--config config/other_api.yaml` (or their own override).
67
+ - Don't paraphrase the README. For supported languages (33), voice catalog, and full flag docs, point them at `README.md` or `violin --help`.
@@ -0,0 +1,17 @@
1
+ .git
2
+ .gitignore
3
+ .env
4
+ .env.*
5
+ !.env.example
6
+ jobs/
7
+ __pycache__
8
+ *.pyc
9
+ *.pyo
10
+ .venv
11
+ .mypy_cache
12
+ .ruff_cache
13
+ *.egg-info
14
+ dist/
15
+ build/
16
+ *.md
17
+ !README.md
@@ -0,0 +1,21 @@
1
+ # ── API keys ─────────────────────────────────────────────────
2
+ # Only fill in the keys for the providers you actually use — see
3
+ # config/default.yaml `models:` section. The default Together stack
4
+ # (Whisper + LLM + Cartesia TTS) needs only TOGETHER_API_KEY.
5
+
6
+ # Together AI — https://api.together.ai
7
+ TOGETHER_API_KEY=
8
+
9
+ # OpenAI — needed when any stage uses `provider: openai`
10
+ # (whisper-1, GPT models, tts-1, vision chat). https://platform.openai.com
11
+ OPENAI_API_KEY=
12
+
13
+ # ElevenLabs — needed when TTS uses `provider: elevenlabs`
14
+ # https://elevenlabs.io/app/settings/api-keys
15
+ ELEVENLABS_API_KEY=
16
+
17
+
18
+ # ── Optional ─────────────────────────────────────────────────
19
+
20
+ # CORS allow-list for the web app (default: * — any origin)
21
+ # CORS_ORIGINS=https://yourdomain.com
@@ -0,0 +1,18 @@
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ dist/
6
+ wheels/
7
+ *.egg-info
8
+
9
+ # Virtual environments
10
+ .venv
11
+ .env
12
+ examples/
13
+
14
+ #logs
15
+ *.log
16
+ jobs/
17
+
18
+ .DS_Store
@@ -0,0 +1 @@
1
+ 3.13
violin-0.1.0/Caddyfile ADDED
@@ -0,0 +1,12 @@
1
+ # Replace with your actual domain name.
2
+ # Caddy will automatically provision a Let's Encrypt TLS certificate
3
+ # once the domain's DNS A record points to this server's public IP.
4
+ #
5
+ # For local testing without a domain, replace with:
6
+ # :80 {
7
+ # reverse_proxy violin:8000
8
+ # }
9
+
10
+ violin-ai.com {
11
+ reverse_proxy violin:8000
12
+ }
@@ -0,0 +1,19 @@
1
+ FROM python:3.13-slim AS base
2
+
3
+ RUN apt-get update && \
4
+ apt-get install -y --no-install-recommends libsndfile1 ffmpeg && \
5
+ rm -rf /var/lib/apt/lists/*
6
+
7
+ COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
8
+
9
+ WORKDIR /app
10
+
11
+ COPY pyproject.toml uv.lock ./
12
+ RUN uv sync --frozen --no-dev --no-install-project
13
+
14
+ COPY . .
15
+ RUN uv sync --frozen --no-dev
16
+
17
+ EXPOSE 8000
18
+
19
+ CMD ["uv", "run", "run_api.py", "--host", "0.0.0.0", "--port", "8000", "--config", "config/prod.yaml"]
violin-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Violin contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
violin-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,315 @@
1
+ Metadata-Version: 2.4
2
+ Name: violin
3
+ Version: 0.1.0
4
+ Summary: Open-source video dubbing — translate any video into 33 languages with native-sounding voice-over and synced subtitles.
5
+ Project-URL: Repository, https://github.com/shang-zhu/Violin
6
+ Project-URL: Issues, https://github.com/shang-zhu/Violin/issues
7
+ Author: Shang Zhu, Qinghong Lin
8
+ License: MIT
9
+ License-File: LICENSE
10
+ Keywords: dubbing,elevenlabs,subtitles,together-ai,translation,tts,video,whisper
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Intended Audience :: End Users/Desktop
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Programming Language :: Python :: 3.13
20
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
21
+ Classifier: Topic :: Multimedia :: Video
22
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
23
+ Requires-Python: >=3.10
24
+ Requires-Dist: aiofiles>=25.1.0
25
+ Requires-Dist: elevenlabs>=1.0.0
26
+ Requires-Dist: fastapi>=0.135.2
27
+ Requires-Dist: ffmpeg-python>=0.2.0
28
+ Requires-Dist: httpx>=0.28.1
29
+ Requires-Dist: imageio-ffmpeg>=0.6.0
30
+ Requires-Dist: openai>=1.0.0
31
+ Requires-Dist: python-dotenv>=1.2.2
32
+ Requires-Dist: python-multipart>=0.0.22
33
+ Requires-Dist: pyyaml>=6.0.3
34
+ Requires-Dist: soundfile>=0.13.1
35
+ Requires-Dist: together>=2.5.0
36
+ Requires-Dist: uvicorn>=0.42.0
37
+ Requires-Dist: yt-dlp>=2025.1.0
38
+ Description-Content-Type: text/markdown
39
+
40
+ # 🎻 Violin
41
+
42
+ **Open-source video dubbing — translate any video into 33 languages with natural-sounding voice-over and synced subtitles.**
43
+
44
+ [🌐 Live demo](https://violin-ai.com) · [📜 MIT License](https://github.com/shang-zhu/violin/blob/main/LICENSE)
45
+
46
+ <!-- ![demo](assets/outcome.png) -->
47
+
48
+ Upload a video. Violin transcribes the speech, translates it, synthesizes a native-sounding voice-over in the target language, and remuxes it back into the video — fully aligned, with optional SRT subtitles.
49
+
50
+ Available as a **CLI**, a **FastAPI web app**, and a **Claude Code skill**.
51
+
52
+ ---
53
+
54
+ ## ✨ Features
55
+
56
+ - **33 target languages** with handpicked native-speaker voices for the 16 most-used ones (Cartesia Sonic 3 + ElevenLabs)
57
+ - **In-video Q&A** — ask questions about any moment in the dubbed video; answers use nearby subtitles plus sampled frames
58
+ - **Natural-language voice picker** — describe the voice you want, an LLM picks from the catalog
59
+ - **6 style profiles** *(experimental)* — standard / kids / academic / casual / storyteller / news
60
+ - **Pluggable stack** — Together / OpenAI / ElevenLabs interchangeable for every stage, one YAML
61
+
62
+ ---
63
+
64
+ ## 🚀 Quick start
65
+
66
+ ### Try it without installing anything
67
+
68
+ The live demo runs at **<https://violin-ai.com>** — drop a short clip in, get a dubbed video out in a few minutes.
69
+
70
+ ### Run locally
71
+
72
+ Requires **Python 3.10+** and **ffmpeg** on PATH.
73
+
74
+ ```bash
75
+ curl -LsSf https://astral.sh/uv/install.sh | sh # install uv if you don't have it
76
+ uv tool install violin # recommended — faster, isolated
77
+ # or: pip install violin # if you'd rather install into your current Python env
78
+
79
+ export TOGETHER_API_KEY=... # get one at https://api.together.ai (add to ~/.zshrc to persist)
80
+ ```
81
+
82
+ Three ways to use it:
83
+
84
+ **1. CLI** — translate one file:
85
+
86
+ ```bash
87
+ violin lecture.mp4 lecture_zh.mp4 --language Chinese
88
+ ```
89
+
90
+ **2. Web app** — full REST API + browser UI:
91
+
92
+ ```bash
93
+ violin-api
94
+ # → http://127.0.0.1:8000 (browser UI)
95
+ # → http://127.0.0.1:8000/docs (interactive API docs)
96
+ ```
97
+
98
+ **3. Claude Code skill** — invoke from any Claude Code session:
99
+
100
+ ```bash
101
+ violin --install-skill # one-time: copies the skill into ~/.claude/skills/
102
+ claude
103
+ > please use the violin skill to translate path/to/video.mp4 into Chinese
104
+ ```
105
+
106
+ <details><summary>Run from source (for hacking on the pipeline)</summary>
107
+
108
+ ```bash
109
+ git clone https://github.com/shang-zhu/violin.git
110
+ cd violin
111
+ uv sync
112
+ cp .env.example .env # then fill in TOGETHER_API_KEY
113
+ uv run main.py lecture.mp4 lecture_zh.mp4 --language Chinese
114
+ ```
115
+ </details>
116
+
117
+ ---
118
+
119
+ ## 🎬 How Violin works
120
+
121
+ ```
122
+ Video
123
+
124
+ ├─ ffmpeg ─────────────────────► Extract audio (16 kHz WAV)
125
+
126
+ ├─ Whisper Large v3 ────────────► Word-level timestamps → sentence segments
127
+
128
+ ├─ LLM (DeepSeek V4 Pro by default) ──► Translate each segment, respecting style profile
129
+
130
+ ├─ TTS (Cartesia Sonic 3 by default) ─► Synthesize dubbed audio per segment
131
+
132
+ └─ ffmpeg ─────────────────────► Speed-align video to dubbed audio,
133
+ concat with freeze-frame fallback,
134
+ single-pass AAC encode the audio track,
135
+ write output mp4 + optional SRT
136
+ ```
137
+
138
+ ---
139
+
140
+ ## ⚙️ Configuration
141
+
142
+ Override any default by writing your own YAML and passing it with `--config my.yaml` — only the keys you want to change need to appear; values deep-merge with the [built-in defaults](https://github.com/shang-zhu/violin/blob/main/config/default.yaml).
143
+
144
+ ### Switch providers
145
+
146
+ ```yaml
147
+ # config/default.yaml — pick the stack you want
148
+ models:
149
+ transcription:
150
+ provider: together # together | openai
151
+ model: openai/whisper-large-v3 # together → openai/whisper-large-v3 | openai → whisper-1
152
+ translation:
153
+ provider: together # together | openai
154
+ model: deepseek-ai/DeepSeek-V4-Pro # together → deepseek-ai/DeepSeek-V4-Pro | openai → gpt-5.5
155
+ tts:
156
+ provider: together # together | elevenlabs | openai
157
+ model: cartesia/sonic-3 # together → cartesia/sonic-3 | elevenlabs → eleven_v3 | openai → tts-1-hd
158
+ ```
159
+
160
+ ### Production overrides
161
+
162
+ A starter `config/prod.yaml` is included for public deployments. It adds upload limits, serializes jobs, and caps ffmpeg concurrency. The included `Dockerfile` + `docker-compose.yml` + `Caddyfile` are how the live demo is hosted — `docker compose up -d --build` after filling `.env` is enough to put a copy of Violin behind auto-HTTPS on any Docker host.
163
+
164
+ ### Environment variables
165
+
166
+ | Variable | When required | Description |
167
+ |----------|---------------|-------------|
168
+ | `TOGETHER_API_KEY` | **Recommended** — covers every stage with the default config | Together AI API key |
169
+ | `OPENAI_API_KEY` | Any stage uses `provider: openai` | Covers `whisper-1`, GPT models, and `tts-1` |
170
+ | `ELEVENLABS_API_KEY` | TTS uses `provider: elevenlabs` | ElevenLabs API key |
171
+ | `CORS_ORIGINS` | Optional | Comma-separated allowed origins (default: `*`) |
172
+
173
+ > You only need keys for the providers you actually pick. Pure-OpenAI deployments (all stages on `openai`) work too — `OPENAI_API_KEY` alone is enough. Same idea for ElevenLabs.
174
+
175
+ ---
176
+
177
+ ## 🎭 Style profiles
178
+
179
+ Six built-in profiles tune both the translation LLM prompt and the TTS delivery. Use `--style <name>` on the CLI or pass `style` in API requests.
180
+
181
+ | Style | Tone | TTS speed | Emotion |
182
+ |-------|------|-----------|---------|
183
+ | `standard` | Faithful translation, natural voice | 1.0× | — |
184
+ | `kids` | Rewritten for a 7-year-old, plain language | 1.0× | excited |
185
+ | `academic` | Formal register, preserves jargon and honorifics | 0.95× | calm |
186
+ | `casual` | Spoken slang, contractions, friendly | 1.1× | content |
187
+ | `storyteller` | Vivid, dramatic narration | 0.9× | enthusiastic |
188
+ | `news` | Concise, declarative, broadcast-style | 1.0× | neutral |
189
+
190
+ Add your own by editing `prompts/styles.yaml`.
191
+
192
+ See all available styles: `violin --style list`.
193
+
194
+ ---
195
+
196
+ ## 💻 CLI usage
197
+
198
+ > Examples use the PyPI-installed `violin` command. If you're running from a git checkout, substitute `uv run main.py` for `violin` (and `uv run run_api.py` for `violin-api`).
199
+
200
+ ```bash
201
+ # Basic
202
+ violin lecture.mp4 lecture_es.mp4 --language Spanish
203
+
204
+ # Pick a style
205
+ violin talk.mp4 talk_zh.mp4 --language Chinese --style kids
206
+
207
+ # Pick a specific voice
208
+ violin lecture.mp4 lecture_fr.mp4 --language French --voice "french narrator man"
209
+
210
+ # Skip SRT
211
+ violin lecture.mp4 lecture_ja.mp4 --language Japanese --no-subtitles
212
+
213
+ # Full replacement (no original audio underneath)
214
+ violin lecture.mp4 lecture_ko.mp4 --language Korean --no-voiceover
215
+
216
+ # Custom config (e.g. switch to OpenAI/ElevenLabs)
217
+ violin lecture.mp4 lecture_it.mp4 --language Italian --config config/other_api.yaml
218
+ ```
219
+
220
+ ### CLI flags
221
+
222
+ | Flag | Default | Description |
223
+ |------|---------|-------------|
224
+ | `--language` / `-l` | *(required)* | Target language name (e.g. `Spanish`, `Japanese`) |
225
+ | `--voice` / `-v` | auto | TTS voice. Defaults to the primary native voice for the target language |
226
+ | `--source-language` | `auto-detect` | Source language hint for translation |
227
+ | `--no-subtitles` | off | Skip SRT generation |
228
+ | `--voiceover` / `--no-voiceover` | voiceover on | Keep original audio underneath the dub, or full replacement |
229
+ | `--style` / `-s` | `standard` | Style profile name. Use `--style list` to see all |
230
+ | `--config` / `-c` | `config/default.yaml` | Path to a YAML override file |
231
+ | `--timings-out` | off | Write per-step wall-clock timings + cost as JSON |
232
+
233
+ ---
234
+
235
+ ## 🛰️ Web app & REST API
236
+
237
+ ```bash
238
+ violin-api # default dev mode
239
+ violin-api --host 0.0.0.0 --port 8080 # bind everywhere
240
+ violin-api --config config/prod.yaml # production overrides (requires a git checkout for config/prod.yaml)
241
+ ```
242
+
243
+ Core flow: `POST /jobs` to start, `GET /jobs/{id}` to poll, `GET /jobs/{id}/video` and `/srt` to download, `POST /jobs/{id}/chat` for in-video Q&A. Full list with request/response schemas at **`/docs`**.
244
+
245
+ ### Example
246
+
247
+ ```bash
248
+ # Submit
249
+ JOB=$(curl -s -X POST http://localhost:8000/jobs \
250
+ -F "file=@lecture.mp4" \
251
+ -F "language=Spanish" \
252
+ -F "style=academic" | jq -r .id)
253
+
254
+ # Poll
255
+ curl -s http://localhost:8000/jobs/$JOB | jq '{status, progress}'
256
+
257
+ # Download
258
+ curl -OJ http://localhost:8000/jobs/$JOB/video
259
+ curl -OJ http://localhost:8000/jobs/$JOB/srt
260
+ ```
261
+
262
+ Job data lives under `jobs/{id}/`. Set `api.job_ttl_hours` to auto-delete jobs older than N hours (default `0` = disabled; `config/prod.yaml` uses 24h for the public demo).
263
+
264
+ ---
265
+
266
+ ## 🌍 Supported languages
267
+
268
+ Violin supports **33 target languages**. The 16 below ship with handpicked native-speaker voices for each provider; the rest fall back to the English voice catalog (which is multilingual under both Cartesia Sonic 3 and ElevenLabs `eleven_v3`).
269
+
270
+ Ordered by native-speaker population.
271
+
272
+ | Language | Cartesia native voice (M / F) | ElevenLabs native voice (M / F) |
273
+ |----------|-------------------------------|---------------------------------|
274
+ | Chinese | chinese commercial man / chinese female conversational | Lin / Lingyue |
275
+ | Spanish | spanish narrator man / spanish narrator lady | Carlos / Valeria |
276
+ | English | tutorial man / helpful woman | Adam / Sarah |
277
+ | Hindi | hindi narrator man / hindi narrator woman | Yatin / Madhusmita |
278
+ | Arabic | middle eastern woman | Faris / Haneen |
279
+ | Portuguese | friendly brazilian man / pleasant brazilian lady | Medeiros / Luna |
280
+ | Russian | russian narrator man 1 / russian narrator woman | Ivo / Xenia |
281
+ | Japanese | japanese male conversational / japanese woman conversational | Shohei / Maiko |
282
+ | Turkish | turkish narrator man / turkish calm man | Sinan / Aura |
283
+ | German | german reporter man / german conversational woman | Daniel / Sina |
284
+ | Korean | korean narrator man / korean calm woman | Joon-ho / Soo |
285
+ | French | french narrator man / french narrator lady | Lior / Virginie |
286
+ | Italian | italian narrator man / italian narrator woman | Raffaele / Chiara |
287
+ | Polish | polish confident man / polish narrator woman | Gregor / Jola |
288
+ | Dutch | dutch confident man / dutch man | Ronald / Jolanda |
289
+ | Swedish | swedish narrator man / swedish calm lady | Andreas / Louise |
290
+
291
+ The 17 fallback languages (using the English voice catalog), also ordered by native speakers: Vietnamese, Tamil, Indonesian, Malay, Ukrainian, Romanian, Thai, Greek, Hungarian, Catalan, Czech, Bulgarian, Danish, Slovak, Croatian, Finnish, Norwegian.
292
+
293
+ ---
294
+
295
+ ## 🤝 Contributing
296
+
297
+ PRs welcome. Got questions or hit a bug? Email **<heyviolinai@gmail.com>** or open an issue.
298
+
299
+ ---
300
+
301
+ ## ⚠️ Disclaimer
302
+
303
+ This is a personal open-source project, not a Together AI product. Users are responsible for ensuring they have the right to download and translate any content they process. Designed for Creative Commons, public domain, your own recordings, and other content you have permission to use.
304
+
305
+ ---
306
+
307
+ ## 📜 License
308
+
309
+ [MIT](https://github.com/shang-zhu/violin/blob/main/LICENSE) — use it freely, including commercially.
310
+
311
+ ---
312
+
313
+ ## 🙏 Acknowledgements
314
+
315
+ Built on top of [Together AI](https://together.ai), [Whisper](https://github.com/openai/whisper), [Cartesia Sonic 3](https://cartesia.ai), [ElevenLabs](https://elevenlabs.io), [FastAPI](https://fastapi.tiangolo.com/), and [ffmpeg](https://ffmpeg.org).