@tekyzinc/stt-component 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,279 @@
1
+ # STT-Component
2
+
3
+ ![Version](https://img.shields.io/badge/version-0.1.0-blue)
4
+
5
+ A framework-agnostic, browser-first speech-to-text package with real-time streaming transcription and mid-recording Whisper correction, powered by [@huggingface/transformers](https://github.com/huggingface/transformers.js).
6
+
7
+ ## Features
8
+
9
+ - **Streaming transcription** -- real-time interim text as you speak
10
+ - **Mid-recording Whisper correction** -- automatic correction cycles triggered by speech pauses or forced intervals
11
+ - **Configurable Whisper models** -- tiny, base, small, medium (ONNX via transformers.js)
12
+ - **WebGPU + WASM** -- GPU-accelerated inference in Chrome/Edge with automatic WASM fallback for Firefox/Safari
13
+ - **Event-driven API** -- subscribe to `transcript`, `correction`, `error`, and `status` events
14
+ - **Framework-agnostic** -- works with React, Vue, Svelte, vanilla JS, or any framework
15
+ - **Web Worker inference** -- non-blocking model loading and transcription via dedicated worker thread
16
+ - **Configurable correction timing** -- pause threshold, forced interval, or disable entirely
17
+ - **Audio chunking** -- configurable chunk length and stride for long-form audio
18
+ - **Node.js support** -- compatible with Node.js >= 18 via @huggingface/transformers
19
+
20
+ ## Quick Start
21
+
22
+ ```bash
23
+ npm install @tekyzinc/stt-component
24
+ ```
25
+
26
+ ```typescript
27
+ import { STTEngine } from '@tekyzinc/stt-component';
28
+
29
+ const engine = new STTEngine({ model: 'tiny' });
30
+
31
+ engine.on('transcript', (text) => console.log('Interim:', text));
32
+ engine.on('correction', (text) => console.log('Corrected:', text));
33
+
34
+ await engine.init();
35
+ await engine.start();
36
+ // ... user speaks ...
37
+ const finalText = await engine.stop();
38
+ ```
39
+
40
+ ## API Reference
41
+
42
+ ### STTEngine
43
+
44
+ The main class. Extends `TypedEventEmitter<STTEvents>`.
45
+
46
+ #### `constructor(config?: STTConfig, workerUrl?: URL)`
47
+
48
+ Creates a new engine instance. All config fields are optional -- sensible defaults are applied.
49
+
50
+ #### `init(): Promise<void>`
51
+
52
+ Spawns the Web Worker and loads the Whisper model. Emits `status` events with download progress. Throws on model load failure.
53
+
54
+ #### `start(): Promise<void>`
55
+
56
+ Requests microphone access and begins recording. Enables mid-recording correction cycles. Engine must be in `ready` state (call `init()` first).
57
+
58
+ #### `stop(): Promise<string>`
59
+
60
+ Stops recording, runs a final Whisper transcription on the full audio, emits the final `correction` event, and returns the transcribed text.
61
+
62
+ #### `destroy(): void`
63
+
64
+ Terminates the worker, releases the microphone and AudioContext, removes all event listeners. Call when done with the engine.
65
+
66
+ #### `getState(): Readonly<STTState>`
67
+
68
+ Returns a snapshot of the current engine state.
69
+
70
+ #### `notifyPause(): void`
71
+
72
+ Manually signals a speech pause to the correction orchestrator, which may trigger an early correction cycle.
73
+
74
+ #### `on(event, listener): void`
75
+
76
+ Subscribe to an event. Type-safe -- TypeScript enforces correct callback signatures.
77
+
78
+ #### `off(event, listener): void`
79
+
80
+ Unsubscribe a specific listener.
81
+
82
+ ### Events
83
+
84
+ | Event | Callback Signature | Description |
85
+ |-------|-------------------|-------------|
86
+ | `transcript` | `(text: string) => void` | Streaming interim text during recording |
87
+ | `correction` | `(text: string) => void` | Whisper-corrected text replacing interim text |
88
+ | `error` | `(error: STTError) => void` | Actionable error (`{ code: string, message: string }`) |
89
+ | `status` | `(state: STTState) => void` | Engine state changes |
90
+
91
+ #### Error Codes
92
+
93
+ | Code | When |
94
+ |------|------|
95
+ | `MIC_DENIED` | Microphone access denied or unavailable |
96
+ | `MODEL_LOAD_FAILED` | Whisper model download or initialization failed |
97
+ | `TRANSCRIPTION_FAILED` | Whisper inference failed (recording continues) |
98
+ | `WORKER_ERROR` | Web Worker encountered an error |
99
+
100
+ #### Engine States (`STTStatus`)
101
+
102
+ `idle` -> `loading` -> `ready` -> `recording` -> `processing` -> `ready`
103
+
104
+ | Status | Meaning |
105
+ |--------|---------|
106
+ | `idle` | Engine created but not initialized |
107
+ | `loading` | Model downloading / initializing |
108
+ | `ready` | Model loaded, ready to record |
109
+ | `recording` | Actively capturing audio |
110
+ | `processing` | Running final transcription after stop |
111
+
112
+ ### Configuration
113
+
114
+ All fields are optional. Defaults shown in the table.
115
+
116
+ | Option | Type | Default | Description |
117
+ |--------|------|---------|-------------|
118
+ | `model` | `'tiny' \| 'base' \| 'small' \| 'medium'` | `'tiny'` | Whisper model size |
119
+ | `backend` | `'webgpu' \| 'wasm' \| 'auto'` | `'auto'` | Compute backend (`auto` = WebGPU with WASM fallback) |
120
+ | `language` | `string` | `'en'` | Transcription language |
121
+ | `dtype` | `string` | `'q4'` | Model quantization dtype |
122
+ | `correction.enabled` | `boolean` | `true` | Enable mid-recording Whisper correction |
123
+ | `correction.pauseThreshold` | `number` (ms) | `3000` | Silence duration before triggering correction |
124
+ | `correction.forcedInterval` | `number` (ms) | `5000` | Maximum interval between forced corrections |
125
+ | `chunking.chunkLengthS` | `number` (seconds) | `30` | Chunk length for Whisper processing |
126
+ | `chunking.strideLengthS` | `number` (seconds) | `5` | Stride length for overlapping chunks |
127
+
128
+ ### STTState
129
+
130
+ Returned by `getState()` and emitted with `status` events.
131
+
132
+ ```typescript
133
+ interface STTState {
134
+ status: STTStatus;
135
+ isModelLoaded: boolean;
136
+ loadProgress: number; // 0-100
137
+ backend: 'webgpu' | 'wasm' | null;
138
+ error: string | null;
139
+ }
140
+ ```
141
+
142
+ ## Usage Examples
143
+
144
+ ### Vanilla JavaScript
145
+
146
+ ```html
147
+ <script type="module">
148
+ import { STTEngine } from '@tekyzinc/stt-component';
149
+
150
+ const engine = new STTEngine({ model: 'tiny' });
151
+ const output = document.getElementById('output');
152
+
153
+ engine.on('correction', (text) => {
154
+ output.textContent = text;
155
+ });
156
+
157
+ engine.on('error', (err) => {
158
+ console.error(`[${err.code}] ${err.message}`);
159
+ });
160
+
161
+ await engine.init();
162
+
163
+ document.getElementById('start').onclick = () => engine.start();
164
+ document.getElementById('stop').onclick = async () => {
165
+ const final = await engine.stop();
166
+ output.textContent = final;
167
+ };
168
+ </script>
169
+ ```
170
+
171
+ ### React Pattern
172
+
173
+ No React dependency required -- this just shows the integration pattern.
174
+
175
+ ```tsx
176
+ import { useEffect, useRef, useState } from 'react';
177
+ import { STTEngine } from '@tekyzinc/stt-component';
178
+
179
+ function VoiceInput() {
180
+ const engineRef = useRef<STTEngine | null>(null);
181
+ const [text, setText] = useState('');
182
+
183
+ useEffect(() => {
184
+ const engine = new STTEngine({ model: 'tiny' });
185
+ engineRef.current = engine;
186
+
187
+ engine.on('correction', setText);
188
+ engine.on('error', (err) => console.error(err.code, err.message));
189
+
190
+ engine.init();
191
+ return () => engine.destroy();
192
+ }, []);
193
+
194
+ return (
195
+ <div>
196
+ <button onClick={() => engineRef.current?.start()}>Record</button>
197
+ <button onClick={() => engineRef.current?.stop()}>Stop</button>
198
+ <p>{text}</p>
199
+ </div>
200
+ );
201
+ }
202
+ ```
203
+
204
+ ### Error Handling
205
+
206
+ ```typescript
207
+ import { STTEngine } from '@tekyzinc/stt-component';
208
+
209
+ const engine = new STTEngine();
210
+
211
+ engine.on('error', (err) => {
212
+ switch (err.code) {
213
+ case 'MIC_DENIED':
214
+ alert('Please allow microphone access.');
215
+ break;
216
+ case 'MODEL_LOAD_FAILED':
217
+ console.error('Model failed to load:', err.message);
218
+ break;
219
+ case 'TRANSCRIPTION_FAILED':
220
+ // Non-fatal: recording continues, correction will retry
221
+ console.warn('Transcription error:', err.message);
222
+ break;
223
+ }
224
+ });
225
+
226
+ engine.on('status', (state) => {
227
+ console.log(`Status: ${state.status}, progress: ${state.loadProgress}%`);
228
+ });
229
+
230
+ await engine.init();
231
+ ```
232
+
233
+ ## Browser Compatibility
234
+
235
+ | Browser | Backend | Notes |
236
+ |---------|---------|-------|
237
+ | Chrome 113+ | WebGPU | Full GPU acceleration |
238
+ | Edge 113+ | WebGPU | Full GPU acceleration |
239
+ | Firefox | WASM | Automatic fallback, slower inference |
240
+ | Safari 18+ | WASM | Automatic fallback, slower inference |
241
+
242
+ When `backend` is set to `'auto'` (default), the engine attempts WebGPU first and falls back to WASM silently.
243
+
244
+ ## Node.js
245
+
246
+ Compatible with Node.js >= 18 via `@huggingface/transformers`. In Node.js, the engine uses the WASM backend (no WebGPU). Audio capture (`startCapture`) requires browser APIs (`navigator.mediaDevices`), so in Node.js you would provide pre-recorded audio to the worker directly or use a Node.js audio library for capture.
247
+
248
+ ## Exports
249
+
250
+ The package exports all public types and utilities:
251
+
252
+ ```typescript
253
+ // Main API
254
+ import { STTEngine } from '@tekyzinc/stt-component';
255
+
256
+ // Types
257
+ import type {
258
+ STTConfig,
259
+ STTState,
260
+ STTEvents,
261
+ STTError,
262
+ STTModelSize,
263
+ STTBackend,
264
+ STTStatus,
265
+ } from '@tekyzinc/stt-component';
266
+
267
+ // Utilities (advanced usage)
268
+ import {
269
+ DEFAULT_STT_CONFIG,
270
+ resolveConfig,
271
+ TypedEventEmitter,
272
+ WorkerManager,
273
+ CorrectionOrchestrator,
274
+ } from '@tekyzinc/stt-component';
275
+ ```
276
+
277
+ ## License
278
+
279
+ MIT