dictate-button 1.9.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,5 +1,5 @@
1
1
  # Dictate Button
2
- ![NPM Version](https://img.shields.io/npm/v/dictate-button)
2
+ [![NPM Version](https://img.shields.io/npm/v/dictate-button)](https://www.npmjs.com/package/dictate-button)
3
3
  [![Tests](https://github.com/dictate-button/dictate-button/actions/workflows/test.yml/badge.svg)](https://github.com/dictate-button/dictate-button/actions/workflows/test.yml)
4
4
 
5
5
  A customizable web component that adds speech-to-text dictation capabilities to any text input, textarea field, or contenteditable element on your website.
@@ -90,7 +90,7 @@ Import the component and use it directly in your code:
90
90
  ```html
91
91
  <script type="module" crossorigin src="https://cdn.dictate-button.io/dictate-button.js"></script>
92
92
 
93
- <dictate-button size="30" api-endpoint="https://api.dictate-button.io/transcribe" language="en"></dictate-button>
93
+ <dictate-button size="30" api-endpoint="wss://api.dictate-button.io/v2/transcribe" language="en"></dictate-button>
94
94
  ```
95
95
 
96
96
  ### From NPM
@@ -123,7 +123,7 @@ injectDictateButton(
123
123
  {
124
124
  buttonSize: 30, // Button size in pixels (optional; default: 30)
125
125
  verbose: false, // Log events to console (optional; default: false)
126
- customApiEndpoint: 'https://api.example.com/transcribe' // Optional custom API endpoint
126
+ apiEndpoint: 'wss://api.example.com/transcribe' // Optional custom API endpoint
127
127
  }
128
128
  )
129
129
 
@@ -133,7 +133,7 @@ injectDictateButtonOnLoad(
133
133
  {
134
134
  buttonSize: 30, // Button size in pixels (optional; default: 30)
135
135
  verbose: false, // Log events to console (optional; default: false)
136
- customApiEndpoint: 'https://api.example.com/transcribe', // Optional custom API endpoint
136
+ apiEndpoint: 'wss://api.example.com/transcribe', // Optional custom API endpoint
137
137
  watchDomChanges: true // Watch for DOM changes (optional; default: false)
138
138
  }
139
139
  )
@@ -147,42 +147,55 @@ The wrapper also has the `dictate-button-wrapper` class for easy styling.
147
147
 
148
148
  The dictate-button component emits the following events:
149
149
 
150
- - `recording:started`: Fired when user starts recording.
151
- - `recording:stopped`: Fired when user stops recording.
152
- - `recording:failed`: Fired when an error occurs during recording.
153
- - `transcribing:started`: Fired when transcribing is started.
154
- - `transcribing:finished`: Fired when transcribing is complete. The event detail contains the transcribed text.
155
- - `transcribing:failed`: Fired when an error occurs during transcribing.
150
+ - `dictate-start`: Fired when transcription starts (after microphone access is granted and WebSocket connection is established).
151
+ - `dictate-text`: Fired during transcription when text is available. This includes both interim (partial) transcripts that may change and final transcripts. The event detail contains the current transcribed text.
152
+ - `dictate-end`: Fired when transcription ends. The event detail contains the final transcribed text.
153
+ - `dictate-error`: Fired when an error occurs (microphone access denied, WebSocket connection failure, server error, etc.). The event detail contains the error message.
156
154
 
157
- The ideal scenario is when user first starts recording (`recording:started`), then stops recording (`recording:stopped`), then the recorded audio is sent to the server for processing (`transcribing:started`), and finally the transcribed text is received (`transcribing:finished`).
155
+ The typical flow is:
158
156
 
159
- > recording:started -> recording:stopped -> transcribing:started -> transcribing:finished
157
+ > dictate-start -> dictate-text (multiple times) -> dictate-end
160
158
 
161
- In case of an error in recording or transcribing, the `recording:failed` or `transcribing:failed` event is fired, respectively.
159
+ In case of an error, the `dictate-error` event is fired.
162
160
 
163
161
  Example event handling:
164
162
 
165
163
  ```javascript
166
164
  const dictateButton = document.querySelector('dictate-button');
167
165
 
168
- dictateButton.addEventListener('transcribing:finished', (event) => {
169
- const transcribedText = event.detail;
170
- console.log('Transcribed text:', transcribedText);
171
-
172
- // Add the text to your input field
173
- document.querySelector('#my-input').value += transcribedText;
166
+ dictateButton.addEventListener('dictate-start', () => {
167
+ console.log('Transcription started');
168
+ });
169
+
170
+ dictateButton.addEventListener('dictate-text', (event) => {
171
+ const currentText = event.detail;
172
+ console.log('Current text:', currentText);
173
+ // Update UI with interim/partial transcription
174
+ });
175
+
176
+ dictateButton.addEventListener('dictate-end', (event) => {
177
+ const finalText = event.detail;
178
+ console.log('Final transcribed text:', finalText);
179
+
180
+ // Add the final text to your input field
181
+ document.querySelector('#my-input').value += finalText;
182
+ });
183
+
184
+ dictateButton.addEventListener('dictate-error', (event) => {
185
+ const error = event.detail;
186
+ console.error('Transcription error:', error);
174
187
  });
175
188
  ```
176
189
 
177
190
  ## Attributes
178
191
 
179
- | Attribute | Type | Default | Description |
180
- |---------------|---------|-----------------------------------------|----------------------------------------|
181
- | size | number | 30 | Size of the button in pixels |
182
- | apiEndpoint | string | https://api.dictate-button.io/transcribe| API endpoint for transcription service |
183
- | language | string | (not set) | Optional language code (e.g., 'en', 'fr', 'de') which may speed up the transcription. |
184
- | theme | string | (inherits from page) | 'light' or 'dark' |
185
- | class | string | | Custom CSS class |
192
+ | Attribute | Type | Default | Description |
193
+ |---------------|---------|--------------------------------------------|-----------------------------------------|
194
+ | size | number | 30 | Size of the button in pixels |
195
+ | apiEndpoint | string | wss://api.dictate-button.io/v2/transcribe | WebSockets API endpoint of transcription service |
196
+ | language | string | en | Optional [language](https://github.com/dictate-button/dictate-button/wiki/Supported-Languages-and-Dialects) code (e.g., 'fr', 'de') |
197
+ | theme | string | (inherits from page) | 'light' or 'dark' |
198
+ | class | string | | Custom CSS class |
186
199
 
187
200
  ## Styling
188
201
 
@@ -207,22 +220,29 @@ dictate-button::part(icon) {
207
220
 
208
221
  ## API Endpoint
209
222
 
210
- By default, dictate-button uses the `https://api.dictate-button.io/transcribe` endpoint for speech-to-text conversion.
223
+ By default, dictate-button uses the `wss://api.dictate-button.io/v2/transcribe` endpoint for real-time speech-to-text streaming.
211
224
  You can specify your own endpoint by setting the `apiEndpoint` attribute.
212
225
 
213
- The API expects:
214
- - POST request
215
- - Multipart form data with the following fields:
216
- - `audio`: Audio data as a File (audio/webm format)
217
- - `origin`: The origin of the website (automatically added)
218
- - `language`: Optional language code (if provided as an attribute)
219
- - Response should be JSON with a `text` property containing the transcribed text
226
+ The API uses WebSocket for real-time transcription:
227
+ - **Protocol**: WebSocket (wss://)
228
+ - **Connection**: Opens WebSocket connection with optional language query parameter (e.g., `?language=en`)
229
+ - **Audio Format**: PCM16 audio data at 16kHz sample rate, sent as binary chunks
230
+ - **Messages Sent**:
231
+ - Binary audio data (Int16Array buffers) - Continuous stream of PCM16 audio chunks
232
+ - `{ type: 'close' }` - JSON message to signal end of audio stream and trigger finalization
233
+ - **Messages Received**: JSON messages with the following types:
234
+ - `{ type: 'session_opened', sessionId: string, expiresAt: number }` - Session started
235
+ - `{ type: 'interim_transcript', text: string }` - Interim (partial) transcription result that may change as more audio is processed
236
+ - `{ type: 'transcript', text: string, turn_order?: number }` - Final transcription result for the current turn
237
+ - `{ type: 'session_closed', code: number, reason: string }` - Session ended
238
+ - `{ type: 'error', error: string }` - Error occurred
220
239
 
221
240
  ## Browser Compatibility
222
241
 
223
242
  The dictate-button component requires the following browser features:
224
243
  - Web Components
225
- - MediaRecorder API
226
- - Fetch API
244
+ - MediaStream API (getUserMedia)
245
+ - Web Audio API (AudioContext, AudioWorklet)
246
+ - WebSocket API
227
247
 
228
248
  Works in all modern browsers (Chrome, Firefox, Safari, Edge).