npm - @data-netmonk/mona-chat-widget - Versions diffs - 2.4.3 → 2.6.0 - Mend

@data-netmonk/mona-chat-widget 2.4.3 → 2.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md +65 -3
package/dist/index.cjs +80 -65
package/dist/index.d.ts +30 -0
package/dist/index.js +14738 -13972
package/dist/phoneme-mona/A.jpg.png +0 -0
package/dist/phoneme-mona/BP.jpg.png +0 -0
package/dist/phoneme-mona/ChJ.png +0 -0
package/dist/phoneme-mona/E.jpg.png +0 -0
package/dist/phoneme-mona/FV.png +0 -0
package/dist/phoneme-mona/I.jpg.png +0 -0
package/dist/phoneme-mona/KG.png +0 -0
package/dist/phoneme-mona/L.jpg.png +0 -0
package/dist/phoneme-mona/M.jpg.png +0 -0
package/dist/phoneme-mona/O.jpg.png +0 -0
package/dist/phoneme-mona/SZ.png +0 -0
package/dist/phoneme-mona/U.jpg.png +0 -0
package/dist/style.css +1 -1
package/package.json +3 -3

package/README.md CHANGED Viewed

@@ -8,9 +8,15 @@ Chat widget package developed by Netmonk data & solution team to be imported in
 ---
-**Latest Version Changes:**
+**Latest Version Changes (`v2.6.0`):**
-⚠️ **Breaking Changes:**
+✅ **Non-breaking Changes:**
+1. **Built-in voice mode button** - Chat widget now includes a microphone button to toggle voice mode directly from the input area
+2. **Speech-to-text flow for voice mode** - Recorded user audio is sent to the configured `VITE_STT_ENDPOINT`, then the transcription is forwarded as a normal chat message
+3. **Phoneme-driven avatar animation** - The widget can animate Mona's avatar during TTS playback based on phoneme/viseme IDs returned by the backend
+4. **Supported phoneme IDs** - `A`, `BP`, `ChJ`, `E`, `FV`, `I`, `KG`, `L`, `M`, `O`, `SZ`, `U`
+⚠️ **Previous Breaking Changes:**
 1. **Removed `type` and `agentType` props** - These parameters are no longer used and have been removed from all components
 2. **Renamed `botServerUrl` to `webhookUrl`** - For better clarity and consistency
 3. **`webhookUrl` is now required** - Must be provided as a prop
@@ -94,7 +100,21 @@ Chat widget package developed by Netmonk data & solution team to be imported in
    cp .env.example .env
    ```
 3. Populate .env
-4. Enable mock mode (optional)
+4. Optional speech endpoints
+  To enable voice mode and phoneme-based avatar animation, set these environment variables in your `.env` file:
+  ```
+  VITE_STT_ENDPOINT=https://your-stt-service.example.com/transcribe
+  VITE_TTS_ENDPOINT=https://your-tts-service.example.com/synthesize
+  ```
+  `VITE_STT_ENDPOINT` is used by the built-in mic button to transcribe recorded audio.
+  `VITE_TTS_ENDPOINT` is used for TTS playback and to consume `visemes` data for avatar lip-sync.
+5. Optional TTS debug logging
+  To inspect TTS queue and playback lifecycle in browser console, set `VITE_DEBUG_TTS=true` in your `.env` file.
+  Keep this disabled in production to avoid noisy logs.
+6. Enable mock mode (optional)
    To test the chat widget without a backend server, set `VITE_USE_MOCK_RESPONSES=true` in your `.env` file. The widget will respond to messages like:
    - "start", "hello", "hi", "halo" - Greeting messages
@@ -281,6 +301,48 @@ For responses with buttons:
 ---
+### Voice Mode & Phoneme Support
+---
+The widget now includes a built-in microphone button in the input area. Clicking the button toggles voice mode, requests microphone access, records speech, sends the recorded audio to `VITE_STT_ENDPOINT`, and forwards the returned transcription as a regular user message.
+During TTS playback, the header avatar can switch between phoneme images using the `visemes` payload returned by the TTS service. The widget currently supports these phoneme IDs:
+- `A`
+- `BP`
+- `ChJ`
+- `E`
+- `FV`
+- `I`
+- `KG`
+- `L`
+- `M`
+- `O`
+- `SZ`
+- `U`
+Phoneme IDs are normalized case-insensitively in the widget, so values such as `ChJ` and `CHJ` resolve to the same avatar image.
+**Expected TTS response shape:**
+```json
+{
+  "audioBase64": "<base64-audio>",
+  "contentType": "audio/mpeg",
+  "durationMs": 1450,
+  "visemes": [
+    { "id": "M", "startMs": 0, "endMs": 120 },
+    { "id": "A", "startMs": 121, "endMs": 260 },
+    { "id": "SZ", "startMs": 261, "endMs": 420 }
+  ]
+}
+```
+If `visemes` are omitted, TTS audio can still play normally, but the avatar will not switch phoneme frames dynamically.
+---
 ### Guest User Support
 ---