dictate-button 1.8.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,5 +1,6 @@
1
- # Dictate Button (Web Component)
2
- ![NPM Version](https://img.shields.io/npm/v/dictate-button)
1
+ # Dictate Button
2
+ [![NPM Version](https://img.shields.io/npm/v/dictate-button)](https://www.npmjs.com/package/dictate-button)
3
+ [![Tests](https://github.com/dictate-button/dictate-button/actions/workflows/test.yml/badge.svg)](https://github.com/dictate-button/dictate-button/actions/workflows/test.yml)
3
4
 
4
5
  A customizable web component that adds speech-to-text dictation capabilities to any text input, textarea field, or contenteditable element on your website.
5
6
 
@@ -46,10 +47,9 @@ Both auto-inject modes:
46
47
 
47
48
  #### Option 1: Using the exclusive auto-inject script
48
49
 
49
- In your HTML `<head>` tag, add the following script tags:
50
+ In your HTML `<head>` tag, add the following script tag:
50
51
 
51
52
  ```html
52
- <script type="module" crossorigin src="https://cdn.dictate-button.io/dictate-button.js"></script>
53
53
  <script type="module" crossorigin src="https://cdn.dictate-button.io/inject-exclusive.js"></script>
54
54
  ```
55
55
 
@@ -65,10 +65,9 @@ Add the `data-dictate-button-on` attribute to any `textarea`, `input[type="text"
65
65
 
66
66
  #### Option 2: Using the inclusive auto-inject script
67
67
 
68
- In your HTML `<head>` tag, add the following script tags:
68
+ In your HTML `<head>` tag, add the following script tag:
69
69
 
70
70
  ```html
71
- <script type="module" crossorigin src="https://cdn.dictate-button.io/dictate-button.js"></script>
72
71
  <script type="module" crossorigin src="https://cdn.dictate-button.io/inject-inclusive.js"></script>
73
72
  ```
74
73
 
@@ -91,20 +90,12 @@ Import the component and use it directly in your code:
91
90
  ```html
92
91
  <script type="module" crossorigin src="https://cdn.dictate-button.io/dictate-button.js"></script>
93
92
 
94
- <dictate-button size="30" api-endpoint="https://api.dictate-button.io/transcribe" language="en"></dictate-button>
93
+ <dictate-button size="30" api-endpoint="wss://api.dictate-button.io/v2/transcribe" language="en"></dictate-button>
95
94
  ```
96
95
 
97
96
  ### From NPM
98
97
 
99
- Import once for your app.
100
-
101
- The button component:
102
-
103
- ```js
104
- import 'dictate-button'
105
- ```
106
-
107
- The auto-inject script:
98
+ Import once for your app:
108
99
 
109
100
  ```js
110
101
  // For selected text fields (with data-dictate-button-on attribute):
@@ -123,6 +114,7 @@ Tip: You can also import from subpaths (e.g., 'dictate-button/libs/injectDictate
123
114
  for smaller bundles, if your bundler resolves package subpath exports.
124
115
 
125
116
  ```js
117
+ import 'dictate-button' // Required when using library functions directly
126
118
  import { injectDictateButton, injectDictateButtonOnLoad } from 'dictate-button/libs'
127
119
 
128
120
  // Inject dictate buttons immediately to matching elements
@@ -131,7 +123,7 @@ injectDictateButton(
131
123
  {
132
124
  buttonSize: 30, // Button size in pixels (optional; default: 30)
133
125
  verbose: false, // Log events to console (optional; default: false)
134
- customApiEndpoint: 'https://api.example.com/transcribe' // Optional custom API endpoint
126
+ apiEndpoint: 'wss://api.example.com/transcribe' // Optional custom API endpoint
135
127
  }
136
128
  )
137
129
 
@@ -141,7 +133,7 @@ injectDictateButtonOnLoad(
141
133
  {
142
134
  buttonSize: 30, // Button size in pixels (optional; default: 30)
143
135
  verbose: false, // Log events to console (optional; default: false)
144
- customApiEndpoint: 'https://api.example.com/transcribe', // Optional custom API endpoint
136
+ apiEndpoint: 'wss://api.example.com/transcribe', // Optional custom API endpoint
145
137
  watchDomChanges: true // Watch for DOM changes (optional; default: false)
146
138
  }
147
139
  )
@@ -155,36 +147,55 @@ The wrapper also has the `dictate-button-wrapper` class for easy styling.
155
147
 
156
148
  The dictate-button component emits the following events:
157
149
 
158
- - `recording:started`: Fired when user starts recording.
159
- - `recording:stopped`: Fired when user stops recording.
160
- - `recording:failed`: Fired when an error occurs during recording.
161
- - `transcribing:started`: Fired when transcribing is started.
162
- - `transcribing:finished`: Fired when transcribing is complete. The event detail contains the transcribed text.
163
- - `transcribing:failed`: Fired when an error occurs during transcribing.
150
+ - `dictate-start`: Fired when transcription starts (after microphone access is granted and WebSocket connection is established).
151
+ - `dictate-text`: Fired during transcription when text is available. This includes both interim (partial) transcripts that may change and final transcripts. The event detail contains the current transcribed text.
152
+ - `dictate-end`: Fired when transcription ends. The event detail contains the final transcribed text.
153
+ - `dictate-error`: Fired when an error occurs (microphone access denied, WebSocket connection failure, server error, etc.). The event detail contains the error message.
154
+
155
+ The typical flow is:
156
+
157
+ > dictate-start -> dictate-text (multiple times) -> dictate-end
158
+
159
+ In case of an error, the `dictate-error` event is fired.
164
160
 
165
161
  Example event handling:
166
162
 
167
163
  ```javascript
168
164
  const dictateButton = document.querySelector('dictate-button');
169
165
 
170
- dictateButton.addEventListener('transcribing:finished', (event) => {
171
- const transcribedText = event.detail;
172
- console.log('Transcribed text:', transcribedText);
173
-
174
- // Add the text to your input field
175
- document.querySelector('#my-input').value += transcribedText;
166
+ dictateButton.addEventListener('dictate-start', () => {
167
+ console.log('Transcription started');
168
+ });
169
+
170
+ dictateButton.addEventListener('dictate-text', (event) => {
171
+ const currentText = event.detail;
172
+ console.log('Current text:', currentText);
173
+ // Update UI with interim/partial transcription
174
+ });
175
+
176
+ dictateButton.addEventListener('dictate-end', (event) => {
177
+ const finalText = event.detail;
178
+ console.log('Final transcribed text:', finalText);
179
+
180
+ // Add the final text to your input field
181
+ document.querySelector('#my-input').value += finalText;
182
+ });
183
+
184
+ dictateButton.addEventListener('dictate-error', (event) => {
185
+ const error = event.detail;
186
+ console.error('Transcription error:', error);
176
187
  });
177
188
  ```
178
189
 
179
190
  ## Attributes
180
191
 
181
- | Attribute | Type | Default | Description |
182
- |---------------|---------|-----------------------------------------|----------------------------------------|
183
- | size | number | 30 | Size of the button in pixels |
184
- | apiEndpoint | string | https://api.dictate-button.io/transcribe| API endpoint for transcription service |
185
- | language | string | (not set) | Optional language code (e.g., 'en', 'fr', 'de') which may speed up the transcription. |
186
- | theme | string | (inherits from page) | 'light' or 'dark' |
187
- | class | string | | Custom CSS class |
192
+ | Attribute | Type | Default | Description |
193
+ |---------------|---------|--------------------------------------------|-----------------------------------------|
194
+ | size | number | 30 | Size of the button in pixels |
195
+ | apiEndpoint | string | wss://api.dictate-button.io/v2/transcribe | WebSockets API endpoint of transcription service |
196
+ | language | string | en | Optional [language](https://github.com/dictate-button/dictate-button/wiki/Supported-Languages-and-Dialects) code (e.g., 'fr', 'de') |
197
+ | theme | string | (inherits from page) | 'light' or 'dark' |
198
+ | class | string | | Custom CSS class |
188
199
 
189
200
  ## Styling
190
201
 
@@ -209,22 +220,29 @@ dictate-button::part(icon) {
209
220
 
210
221
  ## API Endpoint
211
222
 
212
- By default, dictate-button uses the `https://api.dictate-button.io/transcribe` endpoint for speech-to-text conversion.
223
+ By default, dictate-button uses the `wss://api.dictate-button.io/v2/transcribe` endpoint for real-time speech-to-text streaming.
213
224
  You can specify your own endpoint by setting the `apiEndpoint` attribute.
214
225
 
215
- The API expects:
216
- - POST request
217
- - Multipart form data with the following fields:
218
- - `audio`: Audio data as a Blob (audio/webm format)
219
- - `origin`: The origin of the website (automatically added)
220
- - `language`: Optional language code (if provided as an attribute)
221
- - Response should be JSON with a `text` property containing the transcribed text
226
+ The API uses WebSocket for real-time transcription:
227
+ - **Protocol**: WebSocket (wss://)
228
+ - **Connection**: Opens WebSocket connection with optional language query parameter (e.g., `?language=en`)
229
+ - **Audio Format**: PCM16 audio data at 16kHz sample rate, sent as binary chunks
230
+ - **Messages Sent**:
231
+ - Binary audio data (Int16Array buffers) - Continuous stream of PCM16 audio chunks
232
+ - `{ type: 'close' }` - JSON message to signal end of audio stream and trigger finalization
233
+ - **Messages Received**: JSON messages with the following types:
234
+ - `{ type: 'session_opened', sessionId: string, expiresAt: number }` - Session started
235
+ - `{ type: 'interim_transcript', text: string }` - Interim (partial) transcription result that may change as more audio is processed
236
+ - `{ type: 'transcript', text: string, turn_order?: number }` - Final transcription result for the current turn
237
+ - `{ type: 'session_closed', code: number, reason: string }` - Session ended
238
+ - `{ type: 'error', error: string }` - Error occurred
222
239
 
223
240
  ## Browser Compatibility
224
241
 
225
242
  The dictate-button component requires the following browser features:
226
243
  - Web Components
227
- - MediaRecorder API
228
- - Fetch API
244
+ - MediaStream API (getUserMedia)
245
+ - Web Audio API (AudioContext, AudioWorklet)
246
+ - WebSocket API
229
247
 
230
248
  Works in all modern browsers (Chrome, Firefox, Safari, Edge).