whisper-api 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +661 -0
- package/NOTICE +22 -0
- package/README.md +163 -0
- package/bin/whisper-api.js +4 -0
- package/deploy/nginx.conf +33 -0
- package/deploy/whisper-api.service +31 -0
- package/dist/whisper-api.js +1125 -0
- package/dist/whisper-api.js.map +1 -0
- package/package.json +79 -0
- package/web/index.html +82 -0
package/NOTICE
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
whisper-api
|
|
2
|
+
Copyright (C) 2026 Alexey Abramov
|
|
3
|
+
|
|
4
|
+
This program is free software: you can redistribute it and/or modify it under
|
|
5
|
+
the terms of the GNU Affero General Public License as published by the Free
|
|
6
|
+
Software Foundation, either version 3 of the License, or (at your option) any
|
|
7
|
+
later version. See the LICENSE file for the full text.
|
|
8
|
+
|
|
9
|
+
This product bundles or depends on third-party software:
|
|
10
|
+
|
|
11
|
+
- whisper.cpp (https://github.com/ggml-org/whisper.cpp) — MIT License.
|
|
12
|
+
Built/invoked at runtime for the native transcription engine.
|
|
13
|
+
- Whisper models (OpenAI) — distributed under the MIT License; GGML conversions
|
|
14
|
+
hosted at https://huggingface.co/ggerganov/whisper.cpp.
|
|
15
|
+
- @huggingface/transformers / transformers.js — Apache-2.0 License.
|
|
16
|
+
- onnxruntime-node — MIT License.
|
|
17
|
+
- ffmpeg-static (bundled FFmpeg binaries) — FFmpeg is licensed under LGPL/GPL;
|
|
18
|
+
see https://github.com/eugeneware/ffmpeg-static for build details.
|
|
19
|
+
- fastify and @fastify/* plugins — MIT License.
|
|
20
|
+
|
|
21
|
+
The names "Whisper" and "OpenAI" are used only to describe API compatibility.
|
|
22
|
+
This project is not affiliated with or endorsed by OpenAI.
|
package/README.md
ADDED
|
@@ -0,0 +1,163 @@
|
|
|
1
|
+
# whisper-api
|
|
2
|
+
|
|
3
|
+
[](https://github.com/alexey-a-abramov/whisper-api/actions/workflows/ci.yml)
|
|
4
|
+
[](https://github.com/alexey-a-abramov/whisper-api/actions/workflows/codeql.yml)
|
|
5
|
+
[](https://scorecard.dev/viewer/?uri=github.com/alexey-a-abramov/whisper-api)
|
|
6
|
+
[](https://www.npmjs.com/package/whisper-api)
|
|
7
|
+
[](https://www.npmjs.com/package/whisper-api)
|
|
8
|
+
[](LICENSE)
|
|
9
|
+
[](https://nodejs.org)
|
|
10
|
+
|
|
11
|
+
**Self-hostable, OpenAI-compatible Whisper speech-to-text endpoint you can stand up on any machine with one command.**
|
|
12
|
+
|
|
13
|
+
It speaks the exact same HTTP API as OpenAI's `POST /v1/audio/transcriptions`, so any tool, SDK, or app that talks to OpenAI Whisper can point at *your* server instead — your audio never leaves your box. Transcription runs locally via [whisper.cpp](https://github.com/ggml-org/whisper.cpp) (fast, GPU-capable) or a pure-JavaScript ONNX engine ([transformers.js](https://github.com/huggingface/transformers.js)) that needs no compiler.
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
npx whisper-api init # pick models, download them, mint an API key
|
|
17
|
+
npx whisper-api start # serve the OpenAI-compatible endpoint
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Features
|
|
23
|
+
|
|
24
|
+
- 🔌 **Drop-in OpenAI compatibility** — `/v1/audio/transcriptions`, `/v1/audio/translations`, `/v1/models`. Repoint any OpenAI client's `base_url` and it just works.
|
|
25
|
+
- 🧠 **Dual engine, auto-detected** — native **whisper.cpp** (CPU + NVIDIA CUDA / Apple Metal, up to `large-v3`) when available, transparent fallback to the portable **ONNX** engine so `npx` works on a bare VPS with no build tools.
|
|
26
|
+
- 🔑 **API key management** — generate, list, and revoke bearer keys. Only salted SHA-256 hashes are stored; raw keys are shown once.
|
|
27
|
+
- 🚦 **Per-key rate limiting** and configurable upload size limits.
|
|
28
|
+
- 📦 **Background model downloads** with progress, from tiny (75 MB) to large-v3 (3.1 GB).
|
|
29
|
+
- 🩺 **`/health` endpoint** and a minimal **web status page** at `/`.
|
|
30
|
+
- 🚀 **Turnkey deployment** — Dockerfile, docker-compose, systemd unit, and an nginx/Caddy TLS reverse-proxy sample.
|
|
31
|
+
|
|
32
|
+
## Requirements
|
|
33
|
+
|
|
34
|
+
- **Node.js ≥ 20**.
|
|
35
|
+
- That's it for the ONNX engine. For the native **whisper.cpp** engine you also need `git`, `cmake`, and a C/C++ compiler (or a prebuilt `whisper-cli` binary pointed to by `WHISPER_CPP_BIN`). FFmpeg is bundled via `ffmpeg-static` — nothing to install.
|
|
36
|
+
|
|
37
|
+
## Quick start
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
# 1. Interactive setup — choose engine, select & download models, create your first key
|
|
41
|
+
npx whisper-api init
|
|
42
|
+
|
|
43
|
+
# 2. Start the server (defaults to 0.0.0.0:8080)
|
|
44
|
+
npx whisper-api start
|
|
45
|
+
|
|
46
|
+
# 3. From any other machine / app:
|
|
47
|
+
curl http://YOUR_SERVER:8080/v1/audio/transcriptions \
|
|
48
|
+
-H "Authorization: Bearer sk-wapi-..." \
|
|
49
|
+
-F file=@audio.m4a \
|
|
50
|
+
-F model=whisper-1
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## Using it from a third-party app
|
|
54
|
+
|
|
55
|
+
Anything that supports the OpenAI API works — just change the base URL and key.
|
|
56
|
+
|
|
57
|
+
**Python (official `openai` SDK):**
|
|
58
|
+
|
|
59
|
+
```python
|
|
60
|
+
from openai import OpenAI
|
|
61
|
+
|
|
62
|
+
client = OpenAI(base_url="https://transcribe.example.com/v1", api_key="sk-wapi-...")
|
|
63
|
+
with open("meeting.m4a", "rb") as f:
|
|
64
|
+
print(client.audio.transcriptions.create(model="whisper-1", file=f).text)
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
**Node (official `openai` SDK):**
|
|
68
|
+
|
|
69
|
+
```js
|
|
70
|
+
import OpenAI from "openai";
|
|
71
|
+
import fs from "node:fs";
|
|
72
|
+
|
|
73
|
+
const client = new OpenAI({ baseURL: "https://transcribe.example.com/v1", apiKey: "sk-wapi-..." });
|
|
74
|
+
const out = await client.audio.transcriptions.create({
|
|
75
|
+
model: "whisper-1",
|
|
76
|
+
file: fs.createReadStream("meeting.m4a"),
|
|
77
|
+
});
|
|
78
|
+
console.log(out.text);
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
**Other OpenAI-compatible apps** (Open WebUI, n8n, LibreChat, Raycast, …): set **Base URL** to `https://your-server/v1` and **API key** to a key from `whisper-api key generate`.
|
|
82
|
+
|
|
83
|
+
## CLI
|
|
84
|
+
|
|
85
|
+
| Command | Description |
|
|
86
|
+
| --- | --- |
|
|
87
|
+
| `whisper-api init` | Interactive setup: engine, models, first API key. |
|
|
88
|
+
| `whisper-api start [-p 8080] [--host 0.0.0.0] [-m base.en] [-e auto]` | Start the API server. |
|
|
89
|
+
| `whisper-api models list` | List available and installed models. |
|
|
90
|
+
| `whisper-api models pull <name>` | Download a model (e.g. `large-v3`). |
|
|
91
|
+
| `whisper-api models rm <name>` | Remove a downloaded GGML model. |
|
|
92
|
+
| `whisper-api key generate [-n name]` | Mint a new API key (shown once). |
|
|
93
|
+
| `whisper-api key list` | List keys with status and last use. |
|
|
94
|
+
| `whisper-api key revoke <id\|prefix>` | Revoke a key. |
|
|
95
|
+
| `whisper-api status` | Show config, installed models, and key count. |
|
|
96
|
+
| `whisper-api build-engine` | Build whisper.cpp from source for native speed. |
|
|
97
|
+
|
|
98
|
+
### Models
|
|
99
|
+
|
|
100
|
+
`tiny`, `base`, `small`, `medium` (and `.en` English-only variants), `large-v3-turbo`, `large-v3`. The OpenAI alias **`whisper-1`** maps to your configured default model.
|
|
101
|
+
|
|
102
|
+
## HTTP API
|
|
103
|
+
|
|
104
|
+
All `/v1/*` routes require `Authorization: Bearer <key>`. `/health` and `/` are public.
|
|
105
|
+
|
|
106
|
+
### `POST /v1/audio/transcriptions`
|
|
107
|
+
`multipart/form-data`:
|
|
108
|
+
|
|
109
|
+
| Field | Required | Notes |
|
|
110
|
+
| --- | --- | --- |
|
|
111
|
+
| `file` | ✅ | Audio/video file (any format FFmpeg can read). |
|
|
112
|
+
| `model` | | Model name or `whisper-1`. Defaults to your configured model. |
|
|
113
|
+
| `language` | | ISO-639-1 hint, e.g. `en`. |
|
|
114
|
+
| `response_format` | | `json` (default), `verbose_json`, `text`, `srt`, `vtt`. |
|
|
115
|
+
| `temperature` | | Sampling temperature. |
|
|
116
|
+
| `prompt` | | Decoding/vocabulary hint. |
|
|
117
|
+
|
|
118
|
+
`json` → `{ "text": "..." }`. `verbose_json` adds `language`, `duration`, and `segments[]`.
|
|
119
|
+
|
|
120
|
+
### `POST /v1/audio/translations`
|
|
121
|
+
Same fields; transcribes **and translates to English**.
|
|
122
|
+
|
|
123
|
+
### `GET /v1/models`
|
|
124
|
+
OpenAI-shaped model list. ・ **`GET /health`** → `{ status, engine, model, activeKeys, uptime, version }`.
|
|
125
|
+
|
|
126
|
+
## Configuration
|
|
127
|
+
|
|
128
|
+
State lives in `~/.whisper-api/` (override with `WHISPER_API_HOME`): `config.json`, `keys.json`, `models/` (GGML), `cache/` (ONNX weights), `bin/` (built whisper.cpp). Any setting can be overridden by environment variables — see [`.env.example`](.env.example).
|
|
129
|
+
|
|
130
|
+
## Deployment
|
|
131
|
+
|
|
132
|
+
**Docker:**
|
|
133
|
+
|
|
134
|
+
```bash
|
|
135
|
+
docker compose up -d --build
|
|
136
|
+
docker compose exec whisper-api node bin/whisper-api.js key generate
|
|
137
|
+
curl localhost:8080/health
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
**systemd + nginx:** see [`deploy/whisper-api.service`](deploy/whisper-api.service) and [`deploy/nginx.conf`](deploy/nginx.conf) (includes a Caddy alternative). Run behind TLS; audio uploads can be large, so the samples raise `client_max_body_size` and proxy timeouts.
|
|
141
|
+
|
|
142
|
+
## Security
|
|
143
|
+
|
|
144
|
+
- Keys are random 256-bit secrets prefixed `sk-wapi-`; only their SHA-256 hashes are stored (`keys.json`, mode `600`). Compared in constant time.
|
|
145
|
+
- Bind to `127.0.0.1` and terminate TLS at a reverse proxy for public deployments.
|
|
146
|
+
- Per-key rate limiting (`WHISPER_API_RATE_MAX`, default 120/min) and a 25 MB upload cap by default.
|
|
147
|
+
- This repo runs **CodeQL** code scanning, **OpenSSF Scorecard**, and **Dependabot**. To report a vulnerability privately, see [`SECURITY.md`](SECURITY.md).
|
|
148
|
+
|
|
149
|
+
## Development
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
npm install
|
|
153
|
+
npm run dev -- init # run the CLI from source via tsx
|
|
154
|
+
npm run typecheck
|
|
155
|
+
npm test
|
|
156
|
+
npm run build # bundle to dist/ with tsup
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
See [`CONTRIBUTING.md`](CONTRIBUTING.md) for the contribution workflow and coding conventions.
|
|
160
|
+
|
|
161
|
+
## License
|
|
162
|
+
|
|
163
|
+
[AGPL-3.0-or-later](LICENSE). If you run a modified version as a network service, the AGPL requires you to offer your users the corresponding source. See [`NOTICE`](NOTICE) for third-party attributions. "Whisper" and "OpenAI" are referenced only to describe API compatibility; this project is not affiliated with OpenAI.
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# Reverse proxy + TLS termination for whisper-api.
|
|
2
|
+
# Pair with certbot for Let's Encrypt certificates.
|
|
3
|
+
#
|
|
4
|
+
# sudo cp deploy/nginx.conf /etc/nginx/sites-available/whisper-api
|
|
5
|
+
# sudo ln -s /etc/nginx/sites-available/whisper-api /etc/nginx/sites-enabled/
|
|
6
|
+
# sudo certbot --nginx -d transcribe.example.com
|
|
7
|
+
# sudo nginx -t && sudo systemctl reload nginx
|
|
8
|
+
|
|
9
|
+
server {
|
|
10
|
+
listen 80;
|
|
11
|
+
server_name transcribe.example.com;
|
|
12
|
+
# certbot inserts the HTTPS server block and the 80->443 redirect.
|
|
13
|
+
location / {
|
|
14
|
+
proxy_pass http://127.0.0.1:8080;
|
|
15
|
+
proxy_http_version 1.1;
|
|
16
|
+
proxy_set_header Host $host;
|
|
17
|
+
proxy_set_header X-Real-IP $remote_addr;
|
|
18
|
+
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
19
|
+
proxy_set_header X-Forwarded-Proto $scheme;
|
|
20
|
+
|
|
21
|
+
# Audio uploads can be large; raise limits and timeouts.
|
|
22
|
+
client_max_body_size 200m;
|
|
23
|
+
proxy_read_timeout 600s;
|
|
24
|
+
proxy_send_timeout 600s;
|
|
25
|
+
proxy_request_buffering off;
|
|
26
|
+
}
|
|
27
|
+
}
|
|
28
|
+
|
|
29
|
+
# --- Caddy alternative (Caddyfile) -------------------------------------------
|
|
30
|
+
# transcribe.example.com {
|
|
31
|
+
# reverse_proxy 127.0.0.1:8080
|
|
32
|
+
# request_body { max_size 200MB }
|
|
33
|
+
# }
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# systemd unit for whisper-api.
|
|
2
|
+
# Install:
|
|
3
|
+
# sudo cp deploy/whisper-api.service /etc/systemd/system/
|
|
4
|
+
# sudo useradd --system --home /var/lib/whisper-api --create-home whisper || true
|
|
5
|
+
# sudo -u whisper npx whisper-api@latest init # one-time setup as the service user
|
|
6
|
+
# sudo systemctl daemon-reload && sudo systemctl enable --now whisper-api
|
|
7
|
+
[Unit]
|
|
8
|
+
Description=whisper-api (self-hosted OpenAI-compatible Whisper endpoint)
|
|
9
|
+
After=network-online.target
|
|
10
|
+
Wants=network-online.target
|
|
11
|
+
|
|
12
|
+
[Service]
|
|
13
|
+
Type=simple
|
|
14
|
+
User=whisper
|
|
15
|
+
Group=whisper
|
|
16
|
+
Environment=WHISPER_API_HOME=/var/lib/whisper-api
|
|
17
|
+
Environment=WHISPER_API_HOST=127.0.0.1
|
|
18
|
+
Environment=WHISPER_API_PORT=8080
|
|
19
|
+
# Pin a version for reproducible deploys; npx caches it after first run.
|
|
20
|
+
ExecStart=/usr/bin/npx --yes whisper-api@latest start
|
|
21
|
+
Restart=on-failure
|
|
22
|
+
RestartSec=3
|
|
23
|
+
# Hardening
|
|
24
|
+
NoNewPrivileges=true
|
|
25
|
+
ProtectSystem=strict
|
|
26
|
+
ProtectHome=true
|
|
27
|
+
ReadWritePaths=/var/lib/whisper-api
|
|
28
|
+
PrivateTmp=true
|
|
29
|
+
|
|
30
|
+
[Install]
|
|
31
|
+
WantedBy=multi-user.target
|