whisper-cpp-node 0.2.11 → 0.2.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +348 -348
  2. package/package.json +4 -4
package/README.md CHANGED
@@ -1,235 +1,235 @@
1
- # whisper-cpp-node
2
-
3
- Node.js bindings for [whisper.cpp](https://github.com/ggerganov/whisper.cpp) - fast speech-to-text with GPU acceleration.
4
-
5
- ## Features
6
-
7
- - **Fast**: Native whisper.cpp performance with GPU acceleration
8
- - **Cross-platform**: macOS (Metal), Windows (Vulkan)
9
- - **Core ML**: Optional Apple Neural Engine support for 3x+ speedup (macOS)
10
- - **OpenVINO**: Optional Intel CPU/GPU encoder acceleration (Windows/Linux)
11
- - **Streaming VAD**: Built-in Silero voice activity detection
12
- - **TypeScript**: Full type definitions included
1
+ # whisper-cpp-node
2
+
3
+ Node.js bindings for [whisper.cpp](https://github.com/ggerganov/whisper.cpp) - fast speech-to-text with GPU acceleration.
4
+
5
+ ## Features
6
+
7
+ - **Fast**: Native whisper.cpp performance with GPU acceleration
8
+ - **Cross-platform**: macOS (Metal), Windows (Vulkan)
9
+ - **Core ML**: Optional Apple Neural Engine support for 3x+ speedup (macOS)
10
+ - **OpenVINO**: Optional Intel CPU/GPU encoder acceleration (Windows/Linux)
11
+ - **Streaming VAD**: Built-in Silero voice activity detection
12
+ - **TypeScript**: Full type definitions included
13
13
  - **GPU Discovery**: Enumerate available GPU devices for multi-GPU selection
14
- - **Self-contained**: No external dependencies, just install and use
15
-
16
- ## Requirements
17
-
18
- **macOS:**
19
- - macOS 13.3+ (Ventura or later)
20
- - Apple Silicon (M1/M2/M3/M4)
21
- - Node.js 18+
22
-
23
- **Windows:**
24
- - Windows 10/11 (x64)
25
- - Node.js 18+
26
- - Vulkan-capable GPU (optional, for GPU acceleration)
27
-
28
- ## Installation
29
-
30
- ```bash
31
- npm install whisper-cpp-node
32
- # or
33
- pnpm add whisper-cpp-node
34
- ```
35
-
36
- The platform-specific binary is automatically installed:
37
- - macOS ARM64: `@whisper-cpp-node/darwin-arm64`
38
- - Windows x64: `@whisper-cpp-node/win32-x64`
39
-
40
- ## Quick Start
41
-
42
- ### File-based transcription
43
-
44
- ```typescript
45
- import {
46
- createWhisperContext,
47
- transcribeAsync,
48
- } from "whisper-cpp-node";
49
-
50
- // Create a context with your model
51
- const ctx = createWhisperContext({
52
- model: "./models/ggml-base.en.bin",
53
- use_gpu: true,
54
- });
55
-
56
- // Transcribe audio file
57
- const result = await transcribeAsync(ctx, {
58
- fname_inp: "./audio.wav",
59
- language: "en",
60
- });
61
-
62
- // Result: { segments: [["00:00:00,000", "00:00:02,500", " Hello world"], ...] }
63
- for (const [start, end, text] of result.segments) {
64
- console.log(`[${start} --> ${end}]${text}`);
65
- }
66
-
67
- // Clean up
68
- ctx.free();
69
- ```
70
-
71
- ### Buffer-based transcription
72
-
73
- ```typescript
74
- import {
75
- createWhisperContext,
76
- transcribeAsync,
77
- } from "whisper-cpp-node";
78
-
79
- const ctx = createWhisperContext({
80
- model: "./models/ggml-base.en.bin",
81
- use_gpu: true,
82
- });
83
-
84
- // Pass raw PCM audio (16kHz, mono, float32)
85
- const pcmData = new Float32Array(/* your audio samples */);
86
- const result = await transcribeAsync(ctx, {
87
- pcmf32: pcmData,
88
- language: "en",
89
- });
90
-
91
- for (const [start, end, text] of result.segments) {
92
- console.log(`[${start} --> ${end}]${text}`);
93
- }
94
-
95
- ctx.free();
96
- ```
97
-
98
- ### Streaming transcription
99
-
100
- Get real-time output as audio is processed. The `on_new_segment` callback fires for each segment as it's generated, while the final callback still receives all segments at completion (backward compatible):
101
-
102
- ```typescript
103
- import { createWhisperContext, transcribe } from "whisper-cpp-node";
104
-
105
- const ctx = createWhisperContext({
106
- model: "./models/ggml-base.en.bin",
107
- });
108
-
109
- transcribe(ctx, {
110
- fname_inp: "./long-audio.wav",
111
- language: "en",
112
-
113
- // Called for each segment as it's generated
114
- on_new_segment: (segment) => {
115
- console.log(`[${segment.start}]${segment.text}`);
116
- },
117
- }, (err, result) => {
118
- // Final callback still receives ALL segments at completion
119
- console.log(`Done! ${result.segments.length} segments`);
120
- ctx.free();
121
- });
122
- ```
123
-
124
- ## API
125
-
126
- ### `createWhisperContext(options)`
127
-
128
- Create a persistent context for transcription.
129
-
130
- ```typescript
131
- interface WhisperContextOptions {
132
- model: string; // Path to GGML model file (required)
133
- use_gpu?: boolean; // Enable GPU acceleration (default: true)
134
- // Uses Metal on macOS, Vulkan on Windows
135
- use_coreml?: boolean; // Enable Core ML on macOS (default: false)
136
- use_openvino?: boolean; // Enable OpenVINO encoder on Intel (default: false)
137
- openvino_device?: string; // OpenVINO device: 'CPU', 'GPU', 'NPU' (default: 'CPU')
138
- openvino_model_path?: string; // Path to OpenVINO encoder model (auto-derived)
139
- openvino_cache_dir?: string; // Cache dir for compiled OpenVINO models
140
- flash_attn?: boolean; // Enable Flash Attention (default: false)
14
+ - **Self-contained**: No external dependencies, just install and use
15
+
16
+ ## Requirements
17
+
18
+ **macOS:**
19
+ - macOS 13.3+ (Ventura or later)
20
+ - Apple Silicon (M1/M2/M3/M4)
21
+ - Node.js 18+
22
+
23
+ **Windows:**
24
+ - Windows 10/11 (x64)
25
+ - Node.js 18+
26
+ - Vulkan-capable GPU (optional, for GPU acceleration)
27
+
28
+ ## Installation
29
+
30
+ ```bash
31
+ npm install whisper-cpp-node
32
+ # or
33
+ pnpm add whisper-cpp-node
34
+ ```
35
+
36
+ The platform-specific binary is automatically installed:
37
+ - macOS ARM64: `@whisper-cpp-node/darwin-arm64`
38
+ - Windows x64: `@whisper-cpp-node/win32-x64`
39
+
40
+ ## Quick Start
41
+
42
+ ### File-based transcription
43
+
44
+ ```typescript
45
+ import {
46
+ createWhisperContext,
47
+ transcribeAsync,
48
+ } from "whisper-cpp-node";
49
+
50
+ // Create a context with your model
51
+ const ctx = createWhisperContext({
52
+ model: "./models/ggml-base.en.bin",
53
+ use_gpu: true,
54
+ });
55
+
56
+ // Transcribe audio file
57
+ const result = await transcribeAsync(ctx, {
58
+ fname_inp: "./audio.wav",
59
+ language: "en",
60
+ });
61
+
62
+ // Result: { segments: [["00:00:00,000", "00:00:02,500", " Hello world"], ...] }
63
+ for (const [start, end, text] of result.segments) {
64
+ console.log(`[${start} --> ${end}]${text}`);
65
+ }
66
+
67
+ // Clean up
68
+ ctx.free();
69
+ ```
70
+
71
+ ### Buffer-based transcription
72
+
73
+ ```typescript
74
+ import {
75
+ createWhisperContext,
76
+ transcribeAsync,
77
+ } from "whisper-cpp-node";
78
+
79
+ const ctx = createWhisperContext({
80
+ model: "./models/ggml-base.en.bin",
81
+ use_gpu: true,
82
+ });
83
+
84
+ // Pass raw PCM audio (16kHz, mono, float32)
85
+ const pcmData = new Float32Array(/* your audio samples */);
86
+ const result = await transcribeAsync(ctx, {
87
+ pcmf32: pcmData,
88
+ language: "en",
89
+ });
90
+
91
+ for (const [start, end, text] of result.segments) {
92
+ console.log(`[${start} --> ${end}]${text}`);
93
+ }
94
+
95
+ ctx.free();
96
+ ```
97
+
98
+ ### Streaming transcription
99
+
100
+ Get real-time output as audio is processed. The `on_new_segment` callback fires for each segment as it's generated, while the final callback still receives all segments at completion (backward compatible):
101
+
102
+ ```typescript
103
+ import { createWhisperContext, transcribe } from "whisper-cpp-node";
104
+
105
+ const ctx = createWhisperContext({
106
+ model: "./models/ggml-base.en.bin",
107
+ });
108
+
109
+ transcribe(ctx, {
110
+ fname_inp: "./long-audio.wav",
111
+ language: "en",
112
+
113
+ // Called for each segment as it's generated
114
+ on_new_segment: (segment) => {
115
+ console.log(`[${segment.start}]${segment.text}`);
116
+ },
117
+ }, (err, result) => {
118
+ // Final callback still receives ALL segments at completion
119
+ console.log(`Done! ${result.segments.length} segments`);
120
+ ctx.free();
121
+ });
122
+ ```
123
+
124
+ ## API
125
+
126
+ ### `createWhisperContext(options)`
127
+
128
+ Create a persistent context for transcription.
129
+
130
+ ```typescript
131
+ interface WhisperContextOptions {
132
+ model: string; // Path to GGML model file (required)
133
+ use_gpu?: boolean; // Enable GPU acceleration (default: true)
134
+ // Uses Metal on macOS, Vulkan on Windows
135
+ use_coreml?: boolean; // Enable Core ML on macOS (default: false)
136
+ use_openvino?: boolean; // Enable OpenVINO encoder on Intel (default: false)
137
+ openvino_device?: string; // OpenVINO device: 'CPU', 'GPU', 'NPU' (default: 'CPU')
138
+ openvino_model_path?: string; // Path to OpenVINO encoder model (auto-derived)
139
+ openvino_cache_dir?: string; // Cache dir for compiled OpenVINO models
140
+ flash_attn?: boolean; // Enable Flash Attention (default: false)
141
141
  gpu_device?: number; // GPU device index (default: 0, see getGpuDevices())
142
142
  dtw?: string; // DTW preset for word timestamps
143
- no_prints?: boolean; // Suppress log output (default: false)
144
- }
145
- ```
146
-
147
- ### `transcribeAsync(context, options)`
148
-
149
- Transcribe audio (Promise-based). Accepts either a file path or PCM buffer.
150
-
151
- ```typescript
152
- // File input
153
- interface TranscribeOptionsFile {
154
- fname_inp: string; // Path to audio file
155
- // ... common options
156
- }
157
-
158
- // Buffer input
159
- interface TranscribeOptionsBuffer {
160
- pcmf32: Float32Array; // Raw PCM (16kHz, mono, float32, -1.0 to 1.0)
161
- // ... common options
162
- }
163
-
164
- // Common options (partial list - see types.ts for full options)
165
- interface TranscribeOptionsBase {
166
- // Language
167
- language?: string; // Language code ('en', 'zh', 'auto')
168
- translate?: boolean; // Translate to English
169
- detect_language?: boolean; // Auto-detect language
170
-
171
- // Threading
172
- n_threads?: number; // CPU threads (default: 4)
173
- n_processors?: number; // Parallel processors
174
-
175
- // Audio processing
176
- offset_ms?: number; // Start offset in ms
177
- duration_ms?: number; // Duration to process (0 = all)
178
-
179
- // Output control
180
- no_timestamps?: boolean; // Disable timestamps
181
- max_len?: number; // Max segment length (chars)
182
- max_tokens?: number; // Max tokens per segment
183
- split_on_word?: boolean; // Split on word boundaries
184
- token_timestamps?: boolean; // Include token-level timestamps
185
-
186
- // Sampling
187
- temperature?: number; // Sampling temperature (0.0 = greedy)
188
- beam_size?: number; // Beam search size (-1 = greedy)
189
- best_of?: number; // Best-of-N sampling
190
-
191
- // Thresholds
192
- entropy_thold?: number; // Entropy threshold
193
- logprob_thold?: number; // Log probability threshold
194
- no_speech_thold?: number; // No-speech probability threshold
195
-
196
- // Context
197
- prompt?: string; // Initial prompt text
198
- no_context?: boolean; // Don't use previous context
199
-
200
- // VAD preprocessing
201
- vad?: boolean; // Enable VAD preprocessing
202
- vad_model?: string; // Path to VAD model
203
- vad_threshold?: number; // VAD threshold (0.0-1.0)
204
- vad_min_speech_duration_ms?: number;
205
- vad_min_silence_duration_ms?: number;
206
- vad_speech_pad_ms?: number;
207
-
208
- // Callbacks
209
- progress_callback?: (progress: number) => void;
210
- on_new_segment?: (segment: StreamingSegment) => void; // Streaming callback
211
- }
212
-
213
- // Streaming segment (passed to on_new_segment callback)
214
- interface StreamingSegment {
215
- start: string; // Start timestamp "HH:MM:SS,mmm"
216
- end: string; // End timestamp
217
- text: string; // Transcribed text
218
- segment_index: number; // 0-based index
219
- is_partial: boolean; // Reserved for future use
220
- tokens?: StreamingToken[]; // Only if token_timestamps enabled
221
- }
222
-
223
- // Result
224
- interface TranscribeResult {
225
- segments: TranscriptSegment[];
226
- }
227
-
228
- // Segment is a tuple: [start, end, text]
229
- type TranscriptSegment = [string, string, string];
230
- // Example: ["00:00:00,000", "00:00:02,500", " Hello world"]
231
- ```
232
-
143
+ no_prints?: boolean; // Suppress log output (default: false)
144
+ }
145
+ ```
146
+
147
+ ### `transcribeAsync(context, options)`
148
+
149
+ Transcribe audio (Promise-based). Accepts either a file path or PCM buffer.
150
+
151
+ ```typescript
152
+ // File input
153
+ interface TranscribeOptionsFile {
154
+ fname_inp: string; // Path to audio file
155
+ // ... common options
156
+ }
157
+
158
+ // Buffer input
159
+ interface TranscribeOptionsBuffer {
160
+ pcmf32: Float32Array; // Raw PCM (16kHz, mono, float32, -1.0 to 1.0)
161
+ // ... common options
162
+ }
163
+
164
+ // Common options (partial list - see types.ts for full options)
165
+ interface TranscribeOptionsBase {
166
+ // Language
167
+ language?: string; // Language code ('en', 'zh', 'auto')
168
+ translate?: boolean; // Translate to English
169
+ detect_language?: boolean; // Auto-detect language
170
+
171
+ // Threading
172
+ n_threads?: number; // CPU threads (default: 4)
173
+ n_processors?: number; // Parallel processors
174
+
175
+ // Audio processing
176
+ offset_ms?: number; // Start offset in ms
177
+ duration_ms?: number; // Duration to process (0 = all)
178
+
179
+ // Output control
180
+ no_timestamps?: boolean; // Disable timestamps
181
+ max_len?: number; // Max segment length (chars)
182
+ max_tokens?: number; // Max tokens per segment
183
+ split_on_word?: boolean; // Split on word boundaries
184
+ token_timestamps?: boolean; // Include token-level timestamps
185
+
186
+ // Sampling
187
+ temperature?: number; // Sampling temperature (0.0 = greedy)
188
+ beam_size?: number; // Beam search size (-1 = greedy)
189
+ best_of?: number; // Best-of-N sampling
190
+
191
+ // Thresholds
192
+ entropy_thold?: number; // Entropy threshold
193
+ logprob_thold?: number; // Log probability threshold
194
+ no_speech_thold?: number; // No-speech probability threshold
195
+
196
+ // Context
197
+ prompt?: string; // Initial prompt text
198
+ no_context?: boolean; // Don't use previous context
199
+
200
+ // VAD preprocessing
201
+ vad?: boolean; // Enable VAD preprocessing
202
+ vad_model?: string; // Path to VAD model
203
+ vad_threshold?: number; // VAD threshold (0.0-1.0)
204
+ vad_min_speech_duration_ms?: number;
205
+ vad_min_silence_duration_ms?: number;
206
+ vad_speech_pad_ms?: number;
207
+
208
+ // Callbacks
209
+ progress_callback?: (progress: number) => void;
210
+ on_new_segment?: (segment: StreamingSegment) => void; // Streaming callback
211
+ }
212
+
213
+ // Streaming segment (passed to on_new_segment callback)
214
+ interface StreamingSegment {
215
+ start: string; // Start timestamp "HH:MM:SS,mmm"
216
+ end: string; // End timestamp
217
+ text: string; // Transcribed text
218
+ segment_index: number; // 0-based index
219
+ is_partial: boolean; // Reserved for future use
220
+ tokens?: StreamingToken[]; // Only if token_timestamps enabled
221
+ }
222
+
223
+ // Result
224
+ interface TranscribeResult {
225
+ segments: TranscriptSegment[];
226
+ }
227
+
228
+ // Segment is a tuple: [start, end, text]
229
+ type TranscriptSegment = [string, string, string];
230
+ // Example: ["00:00:00,000", "00:00:02,500", " Hello world"]
231
+ ```
232
+
233
233
  ### `getGpuDevices()`
234
234
 
235
235
  Enumerate available GPU backend devices. Returns an array of GPU/IGPU devices. Never throws — returns an empty array if no GPUs are available.
@@ -266,122 +266,122 @@ interface GpuDevice {
266
266
  ### `createVadContext(options)`
267
267
 
268
268
  Create a voice activity detection context for streaming audio.
269
-
270
- ```typescript
271
- interface VadContextOptions {
272
- model: string; // Path to Silero VAD model
273
- threshold?: number; // Speech threshold (default: 0.5)
274
- n_threads?: number; // Number of threads (default: 1)
275
- no_prints?: boolean; // Suppress log output
276
- }
277
-
278
- interface VadContext {
279
- getWindowSamples(): number; // Returns 512 (32ms at 16kHz)
280
- getSampleRate(): number; // Returns 16000
281
- process(samples: Float32Array): number; // Returns probability 0.0-1.0
282
- reset(): void; // Reset LSTM state
283
- free(): void; // Release resources
284
- }
285
- ```
286
-
287
- #### VAD Example
288
-
289
- ```typescript
290
- import { createVadContext } from "whisper-cpp-node";
291
-
292
- const vad = createVadContext({
293
- model: "./models/ggml-silero-v6.2.0.bin",
294
- threshold: 0.5,
295
- });
296
-
297
- const windowSize = vad.getWindowSamples(); // 512 samples
298
-
299
- // Process audio in 32ms chunks
300
- function processAudioChunk(samples: Float32Array) {
301
- const probability = vad.process(samples);
302
- if (probability >= 0.5) {
303
- console.log("Speech detected!", probability);
304
- }
305
- }
306
-
307
- // Reset when starting new audio stream
308
- vad.reset();
309
-
310
- // Clean up when done
311
- vad.free();
312
- ```
313
-
314
- ## Core ML Acceleration (macOS)
315
-
316
- For 3x+ faster encoding on Apple Silicon:
317
-
318
- 1. Generate a Core ML model:
319
- ```bash
320
- pip install ane_transformers openai-whisper coremltools
321
- ./models/generate-coreml-model.sh base.en
322
- ```
323
-
324
- 2. Place it next to your GGML model:
325
- ```
326
- models/ggml-base.en.bin
327
- models/ggml-base.en-encoder.mlmodelc/
328
- ```
329
-
330
- 3. Enable Core ML:
331
- ```typescript
332
- const ctx = createWhisperContext({
333
- model: "./models/ggml-base.en.bin",
334
- use_coreml: true,
335
- });
336
- ```
337
-
338
- ## OpenVINO Acceleration (Intel)
339
-
340
- For faster encoder inference on Intel CPUs and GPUs (requires build with OpenVINO support):
341
-
342
- 1. Install OpenVINO and convert the model:
343
- ```bash
344
- pip install openvino openvino-dev
345
- python models/convert-whisper-to-openvino.py --model base.en
346
- ```
347
-
348
- 2. The OpenVINO model files are placed next to your GGML model:
349
- ```
350
- models/ggml-base.en.bin
351
- models/ggml-base.en-encoder-openvino.xml
352
- models/ggml-base.en-encoder-openvino.bin
353
- ```
354
-
355
- 3. Enable OpenVINO:
356
- ```typescript
357
- const ctx = createWhisperContext({
358
- model: "./models/ggml-base.en.bin",
359
- use_openvino: true,
360
- openvino_device: "CPU", // or "GPU" for Intel iGPU
361
- openvino_cache_dir: "./openvino_cache", // optional, speeds up init
362
- });
363
- ```
364
-
365
- **Note:** OpenVINO support requires the addon to be built with `-DADDON_OPENVINO=ON`.
366
-
367
- ## Models
368
-
369
- Download models from [Hugging Face](https://huggingface.co/ggerganov/whisper.cpp):
370
-
371
- ```bash
372
- # Base English model (~150MB)
373
- curl -L -o models/ggml-base.en.bin \
374
- https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
375
-
376
- # Large v3 Turbo quantized (~500MB)
377
- curl -L -o models/ggml-large-v3-turbo-q4_0.bin \
378
- https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo-q4_0.bin
379
-
380
- # Silero VAD model (for streaming VAD)
381
- curl -L -o models/ggml-silero-v6.2.0.bin \
382
- https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-silero-v6.2.0.bin
383
- ```
384
-
385
- ## License
386
-
387
- MIT
269
+
270
+ ```typescript
271
+ interface VadContextOptions {
272
+ model: string; // Path to Silero VAD model
273
+ threshold?: number; // Speech threshold (default: 0.5)
274
+ n_threads?: number; // Number of threads (default: 1)
275
+ no_prints?: boolean; // Suppress log output
276
+ }
277
+
278
+ interface VadContext {
279
+ getWindowSamples(): number; // Returns 512 (32ms at 16kHz)
280
+ getSampleRate(): number; // Returns 16000
281
+ process(samples: Float32Array): number; // Returns probability 0.0-1.0
282
+ reset(): void; // Reset LSTM state
283
+ free(): void; // Release resources
284
+ }
285
+ ```
286
+
287
+ #### VAD Example
288
+
289
+ ```typescript
290
+ import { createVadContext } from "whisper-cpp-node";
291
+
292
+ const vad = createVadContext({
293
+ model: "./models/ggml-silero-v6.2.0.bin",
294
+ threshold: 0.5,
295
+ });
296
+
297
+ const windowSize = vad.getWindowSamples(); // 512 samples
298
+
299
+ // Process audio in 32ms chunks
300
+ function processAudioChunk(samples: Float32Array) {
301
+ const probability = vad.process(samples);
302
+ if (probability >= 0.5) {
303
+ console.log("Speech detected!", probability);
304
+ }
305
+ }
306
+
307
+ // Reset when starting new audio stream
308
+ vad.reset();
309
+
310
+ // Clean up when done
311
+ vad.free();
312
+ ```
313
+
314
+ ## Core ML Acceleration (macOS)
315
+
316
+ For 3x+ faster encoding on Apple Silicon:
317
+
318
+ 1. Generate a Core ML model:
319
+ ```bash
320
+ pip install ane_transformers openai-whisper coremltools
321
+ ./models/generate-coreml-model.sh base.en
322
+ ```
323
+
324
+ 2. Place it next to your GGML model:
325
+ ```
326
+ models/ggml-base.en.bin
327
+ models/ggml-base.en-encoder.mlmodelc/
328
+ ```
329
+
330
+ 3. Enable Core ML:
331
+ ```typescript
332
+ const ctx = createWhisperContext({
333
+ model: "./models/ggml-base.en.bin",
334
+ use_coreml: true,
335
+ });
336
+ ```
337
+
338
+ ## OpenVINO Acceleration (Intel)
339
+
340
+ For faster encoder inference on Intel CPUs and GPUs (requires build with OpenVINO support):
341
+
342
+ 1. Install OpenVINO and convert the model:
343
+ ```bash
344
+ pip install openvino openvino-dev
345
+ python models/convert-whisper-to-openvino.py --model base.en
346
+ ```
347
+
348
+ 2. The OpenVINO model files are placed next to your GGML model:
349
+ ```
350
+ models/ggml-base.en.bin
351
+ models/ggml-base.en-encoder-openvino.xml
352
+ models/ggml-base.en-encoder-openvino.bin
353
+ ```
354
+
355
+ 3. Enable OpenVINO:
356
+ ```typescript
357
+ const ctx = createWhisperContext({
358
+ model: "./models/ggml-base.en.bin",
359
+ use_openvino: true,
360
+ openvino_device: "CPU", // or "GPU" for Intel iGPU
361
+ openvino_cache_dir: "./openvino_cache", // optional, speeds up init
362
+ });
363
+ ```
364
+
365
+ **Note:** OpenVINO support requires the addon to be built with `-DADDON_OPENVINO=ON`.
366
+
367
+ ## Models
368
+
369
+ Download models from [Hugging Face](https://huggingface.co/ggerganov/whisper.cpp):
370
+
371
+ ```bash
372
+ # Base English model (~150MB)
373
+ curl -L -o models/ggml-base.en.bin \
374
+ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
375
+
376
+ # Large v3 Turbo quantized (~500MB)
377
+ curl -L -o models/ggml-large-v3-turbo-q4_0.bin \
378
+ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo-q4_0.bin
379
+
380
+ # Silero VAD model (for streaming VAD)
381
+ curl -L -o models/ggml-silero-v6.2.0.bin \
382
+ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-silero-v6.2.0.bin
383
+ ```
384
+
385
+ ## License
386
+
387
+ MIT
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "whisper-cpp-node",
3
- "version": "0.2.11",
3
+ "version": "0.2.12",
4
4
  "description": "Node.js bindings for whisper.cpp - fast speech-to-text with GPU acceleration",
5
5
  "license": "MIT",
6
6
  "repository": {
@@ -21,9 +21,9 @@
21
21
  "dist"
22
22
  ],
23
23
  "optionalDependencies": {
24
- "@whisper-cpp-node/win32-x64": "0.2.11",
25
- "@whisper-cpp-node/darwin-arm64": "0.2.3",
26
- "@whisper-cpp-node/win32-ia32": "0.2.7"
24
+ "@whisper-cpp-node/darwin-arm64": "0.2.12",
25
+ "@whisper-cpp-node/win32-ia32": "0.2.7",
26
+ "@whisper-cpp-node/win32-x64": "0.2.11"
27
27
  },
28
28
  "devDependencies": {
29
29
  "@types/node": "^20.0.0",