whisper-api 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/NOTICE ADDED
@@ -0,0 +1,22 @@
1
+ whisper-api
2
+ Copyright (C) 2026 Alexey Abramov
3
+
4
+ This program is free software: you can redistribute it and/or modify it under
5
+ the terms of the GNU Affero General Public License as published by the Free
6
+ Software Foundation, either version 3 of the License, or (at your option) any
7
+ later version. See the LICENSE file for the full text.
8
+
9
+ This product bundles or depends on third-party software:
10
+
11
+ - whisper.cpp (https://github.com/ggml-org/whisper.cpp) — MIT License.
12
+ Built/invoked at runtime for the native transcription engine.
13
+ - Whisper models (OpenAI) — distributed under the MIT License; GGML conversions
14
+ hosted at https://huggingface.co/ggerganov/whisper.cpp.
15
+ - @huggingface/transformers / transformers.js — Apache-2.0 License.
16
+ - onnxruntime-node — MIT License.
17
+ - ffmpeg-static (bundled FFmpeg binaries) — FFmpeg is licensed under LGPL/GPL;
18
+ see https://github.com/eugeneware/ffmpeg-static for build details.
19
+ - fastify and @fastify/* plugins — MIT License.
20
+
21
+ The names "Whisper" and "OpenAI" are used only to describe API compatibility.
22
+ This project is not affiliated with or endorsed by OpenAI.
package/README.md ADDED
@@ -0,0 +1,163 @@
1
+ # whisper-api
2
+
3
+ [![CI](https://github.com/alexey-a-abramov/whisper-api/actions/workflows/ci.yml/badge.svg)](https://github.com/alexey-a-abramov/whisper-api/actions/workflows/ci.yml)
4
+ [![CodeQL](https://github.com/alexey-a-abramov/whisper-api/actions/workflows/codeql.yml/badge.svg)](https://github.com/alexey-a-abramov/whisper-api/actions/workflows/codeql.yml)
5
+ [![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/alexey-a-abramov/whisper-api/badge)](https://scorecard.dev/viewer/?uri=github.com/alexey-a-abramov/whisper-api)
6
+ [![npm version](https://img.shields.io/npm/v/whisper-api.svg)](https://www.npmjs.com/package/whisper-api)
7
+ [![npm downloads](https://img.shields.io/npm/dm/whisper-api.svg)](https://www.npmjs.com/package/whisper-api)
8
+ [![License: AGPL v3](https://img.shields.io/badge/license-AGPL--3.0-blue.svg)](LICENSE)
9
+ [![Node.js](https://img.shields.io/node/v/whisper-api.svg)](https://nodejs.org)
10
+
11
+ **Self-hostable, OpenAI-compatible Whisper speech-to-text endpoint you can stand up on any machine with one command.**
12
+
13
+ It speaks the exact same HTTP API as OpenAI's `POST /v1/audio/transcriptions`, so any tool, SDK, or app that talks to OpenAI Whisper can point at *your* server instead — your audio never leaves your box. Transcription runs locally via [whisper.cpp](https://github.com/ggml-org/whisper.cpp) (fast, GPU-capable) or a pure-JavaScript ONNX engine ([transformers.js](https://github.com/huggingface/transformers.js)) that needs no compiler.
14
+
15
+ ```bash
16
+ npx whisper-api init # pick models, download them, mint an API key
17
+ npx whisper-api start # serve the OpenAI-compatible endpoint
18
+ ```
19
+
20
+ ---
21
+
22
+ ## Features
23
+
24
+ - 🔌 **Drop-in OpenAI compatibility** — `/v1/audio/transcriptions`, `/v1/audio/translations`, `/v1/models`. Repoint any OpenAI client's `base_url` and it just works.
25
+ - 🧠 **Dual engine, auto-detected** — native **whisper.cpp** (CPU + NVIDIA CUDA / Apple Metal, up to `large-v3`) when available, transparent fallback to the portable **ONNX** engine so `npx` works on a bare VPS with no build tools.
26
+ - 🔑 **API key management** — generate, list, and revoke bearer keys. Only salted SHA-256 hashes are stored; raw keys are shown once.
27
+ - 🚦 **Per-key rate limiting** and configurable upload size limits.
28
+ - 📦 **Background model downloads** with progress, from tiny (75 MB) to large-v3 (3.1 GB).
29
+ - 🩺 **`/health` endpoint** and a minimal **web status page** at `/`.
30
+ - 🚀 **Turnkey deployment** — Dockerfile, docker-compose, systemd unit, and an nginx/Caddy TLS reverse-proxy sample.
31
+
32
+ ## Requirements
33
+
34
+ - **Node.js ≥ 20**.
35
+ - That's it for the ONNX engine. For the native **whisper.cpp** engine you also need `git`, `cmake`, and a C/C++ compiler (or a prebuilt `whisper-cli` binary pointed to by `WHISPER_CPP_BIN`). FFmpeg is bundled via `ffmpeg-static` — nothing to install.
36
+
37
+ ## Quick start
38
+
39
+ ```bash
40
+ # 1. Interactive setup — choose engine, select & download models, create your first key
41
+ npx whisper-api init
42
+
43
+ # 2. Start the server (defaults to 0.0.0.0:8080)
44
+ npx whisper-api start
45
+
46
+ # 3. From any other machine / app:
47
+ curl http://YOUR_SERVER:8080/v1/audio/transcriptions \
48
+ -H "Authorization: Bearer sk-wapi-..." \
49
+ -F file=@audio.m4a \
50
+ -F model=whisper-1
51
+ ```
52
+
53
+ ## Using it from a third-party app
54
+
55
+ Anything that supports the OpenAI API works — just change the base URL and key.
56
+
57
+ **Python (official `openai` SDK):**
58
+
59
+ ```python
60
+ from openai import OpenAI
61
+
62
+ client = OpenAI(base_url="https://transcribe.example.com/v1", api_key="sk-wapi-...")
63
+ with open("meeting.m4a", "rb") as f:
64
+ print(client.audio.transcriptions.create(model="whisper-1", file=f).text)
65
+ ```
66
+
67
+ **Node (official `openai` SDK):**
68
+
69
+ ```js
70
+ import OpenAI from "openai";
71
+ import fs from "node:fs";
72
+
73
+ const client = new OpenAI({ baseURL: "https://transcribe.example.com/v1", apiKey: "sk-wapi-..." });
74
+ const out = await client.audio.transcriptions.create({
75
+ model: "whisper-1",
76
+ file: fs.createReadStream("meeting.m4a"),
77
+ });
78
+ console.log(out.text);
79
+ ```
80
+
81
+ **Other OpenAI-compatible apps** (Open WebUI, n8n, LibreChat, Raycast, …): set **Base URL** to `https://your-server/v1` and **API key** to a key from `whisper-api key generate`.
82
+
83
+ ## CLI
84
+
85
+ | Command | Description |
86
+ | --- | --- |
87
+ | `whisper-api init` | Interactive setup: engine, models, first API key. |
88
+ | `whisper-api start [-p 8080] [--host 0.0.0.0] [-m base.en] [-e auto]` | Start the API server. |
89
+ | `whisper-api models list` | List available and installed models. |
90
+ | `whisper-api models pull <name>` | Download a model (e.g. `large-v3`). |
91
+ | `whisper-api models rm <name>` | Remove a downloaded GGML model. |
92
+ | `whisper-api key generate [-n name]` | Mint a new API key (shown once). |
93
+ | `whisper-api key list` | List keys with status and last use. |
94
+ | `whisper-api key revoke <id\|prefix>` | Revoke a key. |
95
+ | `whisper-api status` | Show config, installed models, and key count. |
96
+ | `whisper-api build-engine` | Build whisper.cpp from source for native speed. |
97
+
98
+ ### Models
99
+
100
+ `tiny`, `base`, `small`, `medium` (and `.en` English-only variants), `large-v3-turbo`, `large-v3`. The OpenAI alias **`whisper-1`** maps to your configured default model.
101
+
102
+ ## HTTP API
103
+
104
+ All `/v1/*` routes require `Authorization: Bearer <key>`. `/health` and `/` are public.
105
+
106
+ ### `POST /v1/audio/transcriptions`
107
+ `multipart/form-data`:
108
+
109
+ | Field | Required | Notes |
110
+ | --- | --- | --- |
111
+ | `file` | ✅ | Audio/video file (any format FFmpeg can read). |
112
+ | `model` | | Model name or `whisper-1`. Defaults to your configured model. |
113
+ | `language` | | ISO-639-1 hint, e.g. `en`. |
114
+ | `response_format` | | `json` (default), `verbose_json`, `text`, `srt`, `vtt`. |
115
+ | `temperature` | | Sampling temperature. |
116
+ | `prompt` | | Decoding/vocabulary hint. |
117
+
118
+ `json` → `{ "text": "..." }`. `verbose_json` adds `language`, `duration`, and `segments[]`.
119
+
120
+ ### `POST /v1/audio/translations`
121
+ Same fields; transcribes **and translates to English**.
122
+
123
+ ### `GET /v1/models`
124
+ OpenAI-shaped model list. ・ **`GET /health`** → `{ status, engine, model, activeKeys, uptime, version }`.
125
+
126
+ ## Configuration
127
+
128
+ State lives in `~/.whisper-api/` (override with `WHISPER_API_HOME`): `config.json`, `keys.json`, `models/` (GGML), `cache/` (ONNX weights), `bin/` (built whisper.cpp). Any setting can be overridden by environment variables — see [`.env.example`](.env.example).
129
+
130
+ ## Deployment
131
+
132
+ **Docker:**
133
+
134
+ ```bash
135
+ docker compose up -d --build
136
+ docker compose exec whisper-api node bin/whisper-api.js key generate
137
+ curl localhost:8080/health
138
+ ```
139
+
140
+ **systemd + nginx:** see [`deploy/whisper-api.service`](deploy/whisper-api.service) and [`deploy/nginx.conf`](deploy/nginx.conf) (includes a Caddy alternative). Run behind TLS; audio uploads can be large, so the samples raise `client_max_body_size` and proxy timeouts.
141
+
142
+ ## Security
143
+
144
+ - Keys are random 256-bit secrets prefixed `sk-wapi-`; only their SHA-256 hashes are stored (`keys.json`, mode `600`). Compared in constant time.
145
+ - Bind to `127.0.0.1` and terminate TLS at a reverse proxy for public deployments.
146
+ - Per-key rate limiting (`WHISPER_API_RATE_MAX`, default 120/min) and a 25 MB upload cap by default.
147
+ - This repo runs **CodeQL** code scanning, **OpenSSF Scorecard**, and **Dependabot**. To report a vulnerability privately, see [`SECURITY.md`](SECURITY.md).
148
+
149
+ ## Development
150
+
151
+ ```bash
152
+ npm install
153
+ npm run dev -- init # run the CLI from source via tsx
154
+ npm run typecheck
155
+ npm test
156
+ npm run build # bundle to dist/ with tsup
157
+ ```
158
+
159
+ See [`CONTRIBUTING.md`](CONTRIBUTING.md) for the contribution workflow and coding conventions.
160
+
161
+ ## License
162
+
163
+ [AGPL-3.0-or-later](LICENSE). If you run a modified version as a network service, the AGPL requires you to offer your users the corresponding source. See [`NOTICE`](NOTICE) for third-party attributions. "Whisper" and "OpenAI" are referenced only to describe API compatibility; this project is not affiliated with OpenAI.
@@ -0,0 +1,4 @@
1
+ #!/usr/bin/env node
2
+ // SPDX-License-Identifier: AGPL-3.0-or-later
3
+ // whisper-api CLI entrypoint. Thin shim that loads the bundled program.
4
+ import "../dist/whisper-api.js";
@@ -0,0 +1,33 @@
1
+ # Reverse proxy + TLS termination for whisper-api.
2
+ # Pair with certbot for Let's Encrypt certificates.
3
+ #
4
+ # sudo cp deploy/nginx.conf /etc/nginx/sites-available/whisper-api
5
+ # sudo ln -s /etc/nginx/sites-available/whisper-api /etc/nginx/sites-enabled/
6
+ # sudo certbot --nginx -d transcribe.example.com
7
+ # sudo nginx -t && sudo systemctl reload nginx
8
+
9
+ server {
10
+ listen 80;
11
+ server_name transcribe.example.com;
12
+ # certbot inserts the HTTPS server block and the 80->443 redirect.
13
+ location / {
14
+ proxy_pass http://127.0.0.1:8080;
15
+ proxy_http_version 1.1;
16
+ proxy_set_header Host $host;
17
+ proxy_set_header X-Real-IP $remote_addr;
18
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
19
+ proxy_set_header X-Forwarded-Proto $scheme;
20
+
21
+ # Audio uploads can be large; raise limits and timeouts.
22
+ client_max_body_size 200m;
23
+ proxy_read_timeout 600s;
24
+ proxy_send_timeout 600s;
25
+ proxy_request_buffering off;
26
+ }
27
+ }
28
+
29
+ # --- Caddy alternative (Caddyfile) -------------------------------------------
30
+ # transcribe.example.com {
31
+ # reverse_proxy 127.0.0.1:8080
32
+ # request_body { max_size 200MB }
33
+ # }
@@ -0,0 +1,31 @@
1
+ # systemd unit for whisper-api.
2
+ # Install:
3
+ # sudo cp deploy/whisper-api.service /etc/systemd/system/
4
+ # sudo useradd --system --home /var/lib/whisper-api --create-home whisper || true
5
+ # sudo -u whisper npx whisper-api@latest init # one-time setup as the service user
6
+ # sudo systemctl daemon-reload && sudo systemctl enable --now whisper-api
7
+ [Unit]
8
+ Description=whisper-api (self-hosted OpenAI-compatible Whisper endpoint)
9
+ After=network-online.target
10
+ Wants=network-online.target
11
+
12
+ [Service]
13
+ Type=simple
14
+ User=whisper
15
+ Group=whisper
16
+ Environment=WHISPER_API_HOME=/var/lib/whisper-api
17
+ Environment=WHISPER_API_HOST=127.0.0.1
18
+ Environment=WHISPER_API_PORT=8080
19
+ # Pin a version for reproducible deploys; npx caches it after first run.
20
+ ExecStart=/usr/bin/npx --yes whisper-api@latest start
21
+ Restart=on-failure
22
+ RestartSec=3
23
+ # Hardening
24
+ NoNewPrivileges=true
25
+ ProtectSystem=strict
26
+ ProtectHome=true
27
+ ReadWritePaths=/var/lib/whisper-api
28
+ PrivateTmp=true
29
+
30
+ [Install]
31
+ WantedBy=multi-user.target