@data-netmonk/mona-chat-widget 2.4.3 → 2.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -8,9 +8,15 @@ Chat widget package developed by Netmonk data & solution team to be imported in
8
8
 
9
9
  ---
10
10
 
11
- **Latest Version Changes:**
11
+ **Latest Version Changes (`v2.6.0`):**
12
12
 
13
- ⚠️ **Breaking Changes:**
13
+ **Non-breaking Changes:**
14
+ 1. **Built-in voice mode button** - Chat widget now includes a microphone button to toggle voice mode directly from the input area
15
+ 2. **Speech-to-text flow for voice mode** - Recorded user audio is sent to the configured `VITE_STT_ENDPOINT`, then the transcription is forwarded as a normal chat message
16
+ 3. **Phoneme-driven avatar animation** - The widget can animate Mona's avatar during TTS playback based on phoneme/viseme IDs returned by the backend
17
+ 4. **Supported phoneme IDs** - `A`, `BP`, `ChJ`, `E`, `FV`, `I`, `KG`, `L`, `M`, `O`, `SZ`, `U`
18
+
19
+ ⚠️ **Previous Breaking Changes:**
14
20
  1. **Removed `type` and `agentType` props** - These parameters are no longer used and have been removed from all components
15
21
  2. **Renamed `botServerUrl` to `webhookUrl`** - For better clarity and consistency
16
22
  3. **`webhookUrl` is now required** - Must be provided as a prop
@@ -94,7 +100,21 @@ Chat widget package developed by Netmonk data & solution team to be imported in
94
100
  cp .env.example .env
95
101
  ```
96
102
  3. Populate .env
97
- 4. Enable mock mode (optional)
103
+ 4. Optional speech endpoints
104
+
105
+ To enable voice mode and phoneme-based avatar animation, set these environment variables in your `.env` file:
106
+ ```
107
+ VITE_STT_ENDPOINT=https://your-stt-service.example.com/transcribe
108
+ VITE_TTS_ENDPOINT=https://your-tts-service.example.com/synthesize
109
+ ```
110
+
111
+ `VITE_STT_ENDPOINT` is used by the built-in mic button to transcribe recorded audio.
112
+ `VITE_TTS_ENDPOINT` is used for TTS playback and to consume `visemes` data for avatar lip-sync.
113
+ 5. Optional TTS debug logging
114
+
115
+ To inspect TTS queue and playback lifecycle in browser console, set `VITE_DEBUG_TTS=true` in your `.env` file.
116
+ Keep this disabled in production to avoid noisy logs.
117
+ 6. Enable mock mode (optional)
98
118
 
99
119
  To test the chat widget without a backend server, set `VITE_USE_MOCK_RESPONSES=true` in your `.env` file. The widget will respond to messages like:
100
120
  - "start", "hello", "hi", "halo" - Greeting messages
@@ -281,6 +301,48 @@ For responses with buttons:
281
301
 
282
302
  ---
283
303
 
304
+ ### Voice Mode & Phoneme Support
305
+
306
+ ---
307
+
308
+ The widget now includes a built-in microphone button in the input area. Clicking the button toggles voice mode, requests microphone access, records speech, sends the recorded audio to `VITE_STT_ENDPOINT`, and forwards the returned transcription as a regular user message.
309
+
310
+ During TTS playback, the header avatar can switch between phoneme images using the `visemes` payload returned by the TTS service. The widget currently supports these phoneme IDs:
311
+
312
+ - `A`
313
+ - `BP`
314
+ - `ChJ`
315
+ - `E`
316
+ - `FV`
317
+ - `I`
318
+ - `KG`
319
+ - `L`
320
+ - `M`
321
+ - `O`
322
+ - `SZ`
323
+ - `U`
324
+
325
+ Phoneme IDs are normalized case-insensitively in the widget, so values such as `ChJ` and `CHJ` resolve to the same avatar image.
326
+
327
+ **Expected TTS response shape:**
328
+
329
+ ```json
330
+ {
331
+ "audioBase64": "<base64-audio>",
332
+ "contentType": "audio/mpeg",
333
+ "durationMs": 1450,
334
+ "visemes": [
335
+ { "id": "M", "startMs": 0, "endMs": 120 },
336
+ { "id": "A", "startMs": 121, "endMs": 260 },
337
+ { "id": "SZ", "startMs": 261, "endMs": 420 }
338
+ ]
339
+ }
340
+ ```
341
+
342
+ If `visemes` are omitted, TTS audio can still play normally, but the avatar will not switch phoneme frames dynamically.
343
+
344
+ ---
345
+
284
346
  ### Guest User Support
285
347
 
286
348
  ---