npm - @data-netmonk/mona-chat-widget - Versions diffs - 2.6.0 → 2.6.1 - Mend

@data-netmonk/mona-chat-widget 2.6.0 → 2.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md CHANGED Viewed

@@ -8,7 +8,7 @@ Chat widget package developed by Netmonk data & solution team to be imported in
 ---
-**Latest Version Changes (`v2.6.0`):**
+**Latest Version Changes (`v2.6.1`):**
 ✅ **Non-breaking Changes:**
 1. **Built-in voice mode button** - Chat widget now includes a microphone button to toggle voice mode directly from the input area
@@ -104,12 +104,35 @@ Chat widget package developed by Netmonk data & solution team to be imported in
   To enable voice mode and phoneme-based avatar animation, set these environment variables in your `.env` file:
   ```
-  VITE_STT_ENDPOINT=https://your-stt-service.example.com/transcribe
+  VITE_STT_ENDPOINT=https://voice.netmonk-ai.tech/stt
+  VITE_STT_API_KEY=
+  VITE_STT_API_KEY_HEADER=x-api-key
+  VITE_STT_API_KEY_PREFIX=
   VITE_TTS_ENDPOINT=https://your-tts-service.example.com/synthesize
+  VITE_TTS_API_KEY=
+  VITE_TTS_API_KEY_HEADER=Authorization
+  VITE_TTS_API_KEY_PREFIX=Bearer
+  VITE_PREFER_WEBHOOK_TTS=true
+  VITE_TTS_MINIO_OBJECT_ENDPOINT=http://localhost:8000/minio/object
+  VITE_TTS_MINIO_API_KEY=
+  VITE_TTS_MINIO_API_KEY_HEADER=X-API-Key
+  VITE_TTS_MINIO_API_KEY_PREFIX=
+  VITE_TTS_MINIO_BUCKET=chatbot-tts
+  VITE_TTS_MINIO_DOWNLOAD=false
   ```
   `VITE_STT_ENDPOINT` is used by the built-in mic button to transcribe recorded audio.
-  `VITE_TTS_ENDPOINT` is used for TTS playback and to consume `visemes` data for avatar lip-sync.
+  For the Netmonk STT service, set `VITE_STT_API_KEY_HEADER=x-api-key` and leave `VITE_STT_API_KEY_PREFIX` empty so the widget sends the raw key value without `Bearer`.
+  `VITE_PREFER_WEBHOOK_TTS=true` makes the widget prioritize TTS assets from webhook response (for example `tts_assets.items[].audio_object_key`) and use `VITE_TTS_ENDPOINT` only as fallback.
+  Widget tidak lagi konek langsung ke MinIO. Object diambil lewat endpoint voice-engine:
+  `GET /minio/object?object_key=<key>&bucket=<bucket>&download=<true|false>`.
+  `VITE_TTS_MINIO_API_KEY` opsional. Jika diisi, widget mengirim header `X-API-Key` (atau header custom via `VITE_TTS_MINIO_API_KEY_HEADER`).
+  `VITE_TTS_MINIO_BUCKET` dipakai sebagai default bucket bila payload webhook tidak mengirim bucket.
+  Set `VITE_TTS_MINIO_DOWNLOAD=true` jika ingin force download mode saat fetch object.
+  `VITE_TTS_ENDPOINT` is used for fallback TTS playback.
+  The widget expects the TTS response body to contain raw audio binary such as `audio/wav`, and reads lip-sync metadata from `x-tts-visemes-b64`, `x-tts-phonemes-b64`, and `x-tts-phoneme-timeline-b64` response headers.
+  If your TTS endpoint is called cross-origin, the server must expose those headers with `Access-Control-Expose-Headers`.
+  If your TTS service uses a different header such as `x-api-key`, set `VITE_TTS_API_KEY_HEADER=x-api-key` and leave the prefix empty.
 5. Optional TTS debug logging
   To inspect TTS queue and playback lifecycle in browser console, set `VITE_DEBUG_TTS=true` in your `.env` file.
@@ -307,7 +330,22 @@ For responses with buttons:
 The widget now includes a built-in microphone button in the input area. Clicking the button toggles voice mode, requests microphone access, records speech, sends the recorded audio to `VITE_STT_ENDPOINT`, and forwards the returned transcription as a regular user message.
-During TTS playback, the header avatar can switch between phoneme images using the `visemes` payload returned by the TTS service. The widget currently supports these phoneme IDs:
+If `VITE_STT_API_KEY` is set, the STT request also includes the configured auth header. For `https://voice.netmonk-ai.tech/stt`, use `x-api-key` without any prefix. Other providers can still use `Authorization: Bearer <key>` or any custom header via `.env`.
+**Expected STT request/response shape:**
+- Request body: `multipart/form-data`
+- Audio field name: `audio`
+- Request headers: `Accept: application/json`, plus `x-api-key: <key>` when configured for the Netmonk STT service
+- Expected JSON response:
+```json
+{
+  "text": "Halo, saya mau tanya status tiket saya"
+}
+```
+During TTS playback, the header avatar can switch between phoneme images using the metadata returned by the TTS service. The widget currently supports these phoneme IDs:
 - `A`
 - `BP`
@@ -324,22 +362,65 @@ During TTS playback, the header avatar can switch between phoneme images using t
 Phoneme IDs are normalized case-insensitively in the widget, so values such as `ChJ` and `CHJ` resolve to the same avatar image.
-**Expected TTS response shape:**
+**Preferred webhook TTS asset shape (MinIO-first):**
 ```json
 {
-  "audioBase64": "<base64-audio>",
-  "contentType": "audio/mpeg",
-  "durationMs": 1450,
-  "visemes": [
-    { "id": "M", "startMs": 0, "endMs": 120 },
-    { "id": "A", "startMs": 121, "endMs": 260 },
-    { "id": "SZ", "startMs": 261, "endMs": 420 }
-  ]
+  "messages": [{ "type": "text", "text": "Halo, ada yang bisa saya bantu?" }],
+  "tts_assets": {
+    "items": [
+      {
+        "bucket": "chatbot-tts",
+        "response_index": 1,
+        "audio_object_key": "tts/...wav",
+        "phoneme_object_key": "tts/...phonemes.txt",
+        "phoneme_timeline_object_key": "tts/...phoneme-timeline.json",
+        "viseme_timeline_object_key": "tts/...viseme-timeline.json"
+      }
+    ]
+  }
 }
 ```
-If `visemes` are omitted, TTS audio can still play normally, but the avatar will not switch phoneme frames dynamically.
+When this payload is available, the widget resolves MinIO object keys first for audio and timeline metadata. If no usable asset is found, it falls back to `VITE_TTS_ENDPOINT`.
+**Expected TTS response shape:**
+```text
+body: <raw WAV or other audio binary>
+content-type: audio/wav
+x-tts-visemes-b64: W3siaWQiOiJNIiwic3RhcnRNcyI6MCwiZW5kTXMiOjEyMH1d
+x-tts-phonemes-b64: TSBBCg==
+x-tts-phoneme-timeline-b64: W3sicGhvbmVtZSI6Ik0iLCJzdGFydE1zIjowLCJlbmRNcyI6MTIwfV0=
+```
+`x-tts-phonemes-b64` is the legacy phoneme string, while `x-tts-phoneme-timeline-b64` is the timed phoneme timeline. The base64 headers should decode as:
+Decoded `x-tts-phonemes-b64`:
+```text
+M A
+```
+Decoded `x-tts-phoneme-timeline-b64`:
+```json
+[
+  { "phoneme": "M", "startMs": 0, "endMs": 120 }
+]
+```
+Decoded `x-tts-visemes-b64`:
+```json
+[
+  { "id": "M", "startMs": 0, "endMs": 120 },
+  { "id": "A", "startMs": 121, "endMs": 260 },
+  { "id": "SZ", "startMs": 261, "endMs": 420 }
+]
+```
+The widget still supports the older JSON body format as a fallback, but header-based metadata is now the preferred format for binary audio responses.
 ---