@tekyzinc/stt-component 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +279 -0
- package/dist/index.cjs +482 -0
- package/dist/index.cjs.map +1 -0
- package/dist/index.d.cts +223 -0
- package/dist/index.d.ts +223 -0
- package/dist/index.js +445 -0
- package/dist/index.js.map +1 -0
- package/package.json +56 -0
package/README.md
ADDED
|
@@ -0,0 +1,279 @@
|
|
|
1
|
+
# STT-Component
|
|
2
|
+
|
|
3
|
+

|
|
4
|
+
|
|
5
|
+
A framework-agnostic, browser-first speech-to-text package with real-time streaming transcription and mid-recording Whisper correction, powered by [@huggingface/transformers](https://github.com/huggingface/transformers.js).
|
|
6
|
+
|
|
7
|
+
## Features
|
|
8
|
+
|
|
9
|
+
- **Streaming transcription** -- real-time interim text as you speak
|
|
10
|
+
- **Mid-recording Whisper correction** -- automatic correction cycles triggered by speech pauses or forced intervals
|
|
11
|
+
- **Configurable Whisper models** -- tiny, base, small, medium (ONNX via transformers.js)
|
|
12
|
+
- **WebGPU + WASM** -- GPU-accelerated inference in Chrome/Edge with automatic WASM fallback for Firefox/Safari
|
|
13
|
+
- **Event-driven API** -- subscribe to `transcript`, `correction`, `error`, and `status` events
|
|
14
|
+
- **Framework-agnostic** -- works with React, Vue, Svelte, vanilla JS, or any framework
|
|
15
|
+
- **Web Worker inference** -- non-blocking model loading and transcription via dedicated worker thread
|
|
16
|
+
- **Configurable correction timing** -- pause threshold, forced interval, or disable entirely
|
|
17
|
+
- **Audio chunking** -- configurable chunk length and stride for long-form audio
|
|
18
|
+
- **Node.js support** -- compatible with Node.js >= 18 via @huggingface/transformers
|
|
19
|
+
|
|
20
|
+
## Quick Start
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
npm install @tekyzinc/stt-component
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
```typescript
|
|
27
|
+
import { STTEngine } from '@tekyzinc/stt-component';
|
|
28
|
+
|
|
29
|
+
const engine = new STTEngine({ model: 'tiny' });
|
|
30
|
+
|
|
31
|
+
engine.on('transcript', (text) => console.log('Interim:', text));
|
|
32
|
+
engine.on('correction', (text) => console.log('Corrected:', text));
|
|
33
|
+
|
|
34
|
+
await engine.init();
|
|
35
|
+
await engine.start();
|
|
36
|
+
// ... user speaks ...
|
|
37
|
+
const finalText = await engine.stop();
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## API Reference
|
|
41
|
+
|
|
42
|
+
### STTEngine
|
|
43
|
+
|
|
44
|
+
The main class. Extends `TypedEventEmitter<STTEvents>`.
|
|
45
|
+
|
|
46
|
+
#### `constructor(config?: STTConfig, workerUrl?: URL)`
|
|
47
|
+
|
|
48
|
+
Creates a new engine instance. All config fields are optional -- sensible defaults are applied.
|
|
49
|
+
|
|
50
|
+
#### `init(): Promise<void>`
|
|
51
|
+
|
|
52
|
+
Spawns the Web Worker and loads the Whisper model. Emits `status` events with download progress. Throws on model load failure.
|
|
53
|
+
|
|
54
|
+
#### `start(): Promise<void>`
|
|
55
|
+
|
|
56
|
+
Requests microphone access and begins recording. Enables mid-recording correction cycles. Engine must be in `ready` state (call `init()` first).
|
|
57
|
+
|
|
58
|
+
#### `stop(): Promise<string>`
|
|
59
|
+
|
|
60
|
+
Stops recording, runs a final Whisper transcription on the full audio, emits the final `correction` event, and returns the transcribed text.
|
|
61
|
+
|
|
62
|
+
#### `destroy(): void`
|
|
63
|
+
|
|
64
|
+
Terminates the worker, releases the microphone and AudioContext, removes all event listeners. Call when done with the engine.
|
|
65
|
+
|
|
66
|
+
#### `getState(): Readonly<STTState>`
|
|
67
|
+
|
|
68
|
+
Returns a snapshot of the current engine state.
|
|
69
|
+
|
|
70
|
+
#### `notifyPause(): void`
|
|
71
|
+
|
|
72
|
+
Manually signals a speech pause to the correction orchestrator, which may trigger an early correction cycle.
|
|
73
|
+
|
|
74
|
+
#### `on(event, listener): void`
|
|
75
|
+
|
|
76
|
+
Subscribe to an event. Type-safe -- TypeScript enforces correct callback signatures.
|
|
77
|
+
|
|
78
|
+
#### `off(event, listener): void`
|
|
79
|
+
|
|
80
|
+
Unsubscribe a specific listener.
|
|
81
|
+
|
|
82
|
+
### Events
|
|
83
|
+
|
|
84
|
+
| Event | Callback Signature | Description |
|
|
85
|
+
|-------|-------------------|-------------|
|
|
86
|
+
| `transcript` | `(text: string) => void` | Streaming interim text during recording |
|
|
87
|
+
| `correction` | `(text: string) => void` | Whisper-corrected text replacing interim text |
|
|
88
|
+
| `error` | `(error: STTError) => void` | Actionable error (`{ code: string, message: string }`) |
|
|
89
|
+
| `status` | `(state: STTState) => void` | Engine state changes |
|
|
90
|
+
|
|
91
|
+
#### Error Codes
|
|
92
|
+
|
|
93
|
+
| Code | When |
|
|
94
|
+
|------|------|
|
|
95
|
+
| `MIC_DENIED` | Microphone access denied or unavailable |
|
|
96
|
+
| `MODEL_LOAD_FAILED` | Whisper model download or initialization failed |
|
|
97
|
+
| `TRANSCRIPTION_FAILED` | Whisper inference failed (recording continues) |
|
|
98
|
+
| `WORKER_ERROR` | Web Worker encountered an error |
|
|
99
|
+
|
|
100
|
+
#### Engine States (`STTStatus`)
|
|
101
|
+
|
|
102
|
+
`idle` -> `loading` -> `ready` -> `recording` -> `processing` -> `ready`
|
|
103
|
+
|
|
104
|
+
| Status | Meaning |
|
|
105
|
+
|--------|---------|
|
|
106
|
+
| `idle` | Engine created but not initialized |
|
|
107
|
+
| `loading` | Model downloading / initializing |
|
|
108
|
+
| `ready` | Model loaded, ready to record |
|
|
109
|
+
| `recording` | Actively capturing audio |
|
|
110
|
+
| `processing` | Running final transcription after stop |
|
|
111
|
+
|
|
112
|
+
### Configuration
|
|
113
|
+
|
|
114
|
+
All fields are optional. Defaults shown in the table.
|
|
115
|
+
|
|
116
|
+
| Option | Type | Default | Description |
|
|
117
|
+
|--------|------|---------|-------------|
|
|
118
|
+
| `model` | `'tiny' \| 'base' \| 'small' \| 'medium'` | `'tiny'` | Whisper model size |
|
|
119
|
+
| `backend` | `'webgpu' \| 'wasm' \| 'auto'` | `'auto'` | Compute backend (`auto` = WebGPU with WASM fallback) |
|
|
120
|
+
| `language` | `string` | `'en'` | Transcription language |
|
|
121
|
+
| `dtype` | `string` | `'q4'` | Model quantization dtype |
|
|
122
|
+
| `correction.enabled` | `boolean` | `true` | Enable mid-recording Whisper correction |
|
|
123
|
+
| `correction.pauseThreshold` | `number` (ms) | `3000` | Silence duration before triggering correction |
|
|
124
|
+
| `correction.forcedInterval` | `number` (ms) | `5000` | Maximum interval between forced corrections |
|
|
125
|
+
| `chunking.chunkLengthS` | `number` (seconds) | `30` | Chunk length for Whisper processing |
|
|
126
|
+
| `chunking.strideLengthS` | `number` (seconds) | `5` | Stride length for overlapping chunks |
|
|
127
|
+
|
|
128
|
+
### STTState
|
|
129
|
+
|
|
130
|
+
Returned by `getState()` and emitted with `status` events.
|
|
131
|
+
|
|
132
|
+
```typescript
|
|
133
|
+
interface STTState {
|
|
134
|
+
status: STTStatus;
|
|
135
|
+
isModelLoaded: boolean;
|
|
136
|
+
loadProgress: number; // 0-100
|
|
137
|
+
backend: 'webgpu' | 'wasm' | null;
|
|
138
|
+
error: string | null;
|
|
139
|
+
}
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
## Usage Examples
|
|
143
|
+
|
|
144
|
+
### Vanilla JavaScript
|
|
145
|
+
|
|
146
|
+
```html
|
|
147
|
+
<script type="module">
|
|
148
|
+
import { STTEngine } from '@tekyzinc/stt-component';
|
|
149
|
+
|
|
150
|
+
const engine = new STTEngine({ model: 'tiny' });
|
|
151
|
+
const output = document.getElementById('output');
|
|
152
|
+
|
|
153
|
+
engine.on('correction', (text) => {
|
|
154
|
+
output.textContent = text;
|
|
155
|
+
});
|
|
156
|
+
|
|
157
|
+
engine.on('error', (err) => {
|
|
158
|
+
console.error(`[${err.code}] ${err.message}`);
|
|
159
|
+
});
|
|
160
|
+
|
|
161
|
+
await engine.init();
|
|
162
|
+
|
|
163
|
+
document.getElementById('start').onclick = () => engine.start();
|
|
164
|
+
document.getElementById('stop').onclick = async () => {
|
|
165
|
+
const final = await engine.stop();
|
|
166
|
+
output.textContent = final;
|
|
167
|
+
};
|
|
168
|
+
</script>
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
### React Pattern
|
|
172
|
+
|
|
173
|
+
No React dependency required -- this just shows the integration pattern.
|
|
174
|
+
|
|
175
|
+
```tsx
|
|
176
|
+
import { useEffect, useRef, useState } from 'react';
|
|
177
|
+
import { STTEngine } from '@tekyzinc/stt-component';
|
|
178
|
+
|
|
179
|
+
function VoiceInput() {
|
|
180
|
+
const engineRef = useRef<STTEngine | null>(null);
|
|
181
|
+
const [text, setText] = useState('');
|
|
182
|
+
|
|
183
|
+
useEffect(() => {
|
|
184
|
+
const engine = new STTEngine({ model: 'tiny' });
|
|
185
|
+
engineRef.current = engine;
|
|
186
|
+
|
|
187
|
+
engine.on('correction', setText);
|
|
188
|
+
engine.on('error', (err) => console.error(err.code, err.message));
|
|
189
|
+
|
|
190
|
+
engine.init();
|
|
191
|
+
return () => engine.destroy();
|
|
192
|
+
}, []);
|
|
193
|
+
|
|
194
|
+
return (
|
|
195
|
+
<div>
|
|
196
|
+
<button onClick={() => engineRef.current?.start()}>Record</button>
|
|
197
|
+
<button onClick={() => engineRef.current?.stop()}>Stop</button>
|
|
198
|
+
<p>{text}</p>
|
|
199
|
+
</div>
|
|
200
|
+
);
|
|
201
|
+
}
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
### Error Handling
|
|
205
|
+
|
|
206
|
+
```typescript
|
|
207
|
+
import { STTEngine } from '@tekyzinc/stt-component';
|
|
208
|
+
|
|
209
|
+
const engine = new STTEngine();
|
|
210
|
+
|
|
211
|
+
engine.on('error', (err) => {
|
|
212
|
+
switch (err.code) {
|
|
213
|
+
case 'MIC_DENIED':
|
|
214
|
+
alert('Please allow microphone access.');
|
|
215
|
+
break;
|
|
216
|
+
case 'MODEL_LOAD_FAILED':
|
|
217
|
+
console.error('Model failed to load:', err.message);
|
|
218
|
+
break;
|
|
219
|
+
case 'TRANSCRIPTION_FAILED':
|
|
220
|
+
// Non-fatal: recording continues, correction will retry
|
|
221
|
+
console.warn('Transcription error:', err.message);
|
|
222
|
+
break;
|
|
223
|
+
}
|
|
224
|
+
});
|
|
225
|
+
|
|
226
|
+
engine.on('status', (state) => {
|
|
227
|
+
console.log(`Status: ${state.status}, progress: ${state.loadProgress}%`);
|
|
228
|
+
});
|
|
229
|
+
|
|
230
|
+
await engine.init();
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
## Browser Compatibility
|
|
234
|
+
|
|
235
|
+
| Browser | Backend | Notes |
|
|
236
|
+
|---------|---------|-------|
|
|
237
|
+
| Chrome 113+ | WebGPU | Full GPU acceleration |
|
|
238
|
+
| Edge 113+ | WebGPU | Full GPU acceleration |
|
|
239
|
+
| Firefox | WASM | Automatic fallback, slower inference |
|
|
240
|
+
| Safari 18+ | WASM | Automatic fallback, slower inference |
|
|
241
|
+
|
|
242
|
+
When `backend` is set to `'auto'` (default), the engine attempts WebGPU first and falls back to WASM silently.
|
|
243
|
+
|
|
244
|
+
## Node.js
|
|
245
|
+
|
|
246
|
+
Compatible with Node.js >= 18 via `@huggingface/transformers`. In Node.js, the engine uses the WASM backend (no WebGPU). Audio capture (`startCapture`) requires browser APIs (`navigator.mediaDevices`), so in Node.js you would provide pre-recorded audio to the worker directly or use a Node.js audio library for capture.
|
|
247
|
+
|
|
248
|
+
## Exports
|
|
249
|
+
|
|
250
|
+
The package exports all public types and utilities:
|
|
251
|
+
|
|
252
|
+
```typescript
|
|
253
|
+
// Main API
|
|
254
|
+
import { STTEngine } from '@tekyzinc/stt-component';
|
|
255
|
+
|
|
256
|
+
// Types
|
|
257
|
+
import type {
|
|
258
|
+
STTConfig,
|
|
259
|
+
STTState,
|
|
260
|
+
STTEvents,
|
|
261
|
+
STTError,
|
|
262
|
+
STTModelSize,
|
|
263
|
+
STTBackend,
|
|
264
|
+
STTStatus,
|
|
265
|
+
} from '@tekyzinc/stt-component';
|
|
266
|
+
|
|
267
|
+
// Utilities (advanced usage)
|
|
268
|
+
import {
|
|
269
|
+
DEFAULT_STT_CONFIG,
|
|
270
|
+
resolveConfig,
|
|
271
|
+
TypedEventEmitter,
|
|
272
|
+
WorkerManager,
|
|
273
|
+
CorrectionOrchestrator,
|
|
274
|
+
} from '@tekyzinc/stt-component';
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
## License
|
|
278
|
+
|
|
279
|
+
MIT
|