@data-netmonk/mona-chat-widget 2.6.0 → 2.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -8,7 +8,7 @@ Chat widget package developed by Netmonk data & solution team to be imported in
8
8
 
9
9
  ---
10
10
 
11
- **Latest Version Changes (`v2.6.0`):**
11
+ **Latest Version Changes (`v2.6.1`):**
12
12
 
13
13
  ✅ **Non-breaking Changes:**
14
14
  1. **Built-in voice mode button** - Chat widget now includes a microphone button to toggle voice mode directly from the input area
@@ -104,12 +104,35 @@ Chat widget package developed by Netmonk data & solution team to be imported in
104
104
 
105
105
  To enable voice mode and phoneme-based avatar animation, set these environment variables in your `.env` file:
106
106
  ```
107
- VITE_STT_ENDPOINT=https://your-stt-service.example.com/transcribe
107
+ VITE_STT_ENDPOINT=https://voice.netmonk-ai.tech/stt
108
+ VITE_STT_API_KEY=
109
+ VITE_STT_API_KEY_HEADER=x-api-key
110
+ VITE_STT_API_KEY_PREFIX=
108
111
  VITE_TTS_ENDPOINT=https://your-tts-service.example.com/synthesize
112
+ VITE_TTS_API_KEY=
113
+ VITE_TTS_API_KEY_HEADER=Authorization
114
+ VITE_TTS_API_KEY_PREFIX=Bearer
115
+ VITE_PREFER_WEBHOOK_TTS=true
116
+ VITE_TTS_MINIO_OBJECT_ENDPOINT=http://localhost:8000/minio/object
117
+ VITE_TTS_MINIO_API_KEY=
118
+ VITE_TTS_MINIO_API_KEY_HEADER=X-API-Key
119
+ VITE_TTS_MINIO_API_KEY_PREFIX=
120
+ VITE_TTS_MINIO_BUCKET=chatbot-tts
121
+ VITE_TTS_MINIO_DOWNLOAD=false
109
122
  ```
110
123
 
111
124
  `VITE_STT_ENDPOINT` is used by the built-in mic button to transcribe recorded audio.
112
- `VITE_TTS_ENDPOINT` is used for TTS playback and to consume `visemes` data for avatar lip-sync.
125
+ For the Netmonk STT service, set `VITE_STT_API_KEY_HEADER=x-api-key` and leave `VITE_STT_API_KEY_PREFIX` empty so the widget sends the raw key value without `Bearer`.
126
+ `VITE_PREFER_WEBHOOK_TTS=true` makes the widget prioritize TTS assets from webhook response (for example `tts_assets.items[].audio_object_key`) and use `VITE_TTS_ENDPOINT` only as fallback.
127
+ Widget tidak lagi konek langsung ke MinIO. Object diambil lewat endpoint voice-engine:
128
+ `GET /minio/object?object_key=<key>&bucket=<bucket>&download=<true|false>`.
129
+ `VITE_TTS_MINIO_API_KEY` opsional. Jika diisi, widget mengirim header `X-API-Key` (atau header custom via `VITE_TTS_MINIO_API_KEY_HEADER`).
130
+ `VITE_TTS_MINIO_BUCKET` dipakai sebagai default bucket bila payload webhook tidak mengirim bucket.
131
+ Set `VITE_TTS_MINIO_DOWNLOAD=true` jika ingin force download mode saat fetch object.
132
+ `VITE_TTS_ENDPOINT` is used for fallback TTS playback.
133
+ The widget expects the TTS response body to contain raw audio binary such as `audio/wav`, and reads lip-sync metadata from `x-tts-visemes-b64`, `x-tts-phonemes-b64`, and `x-tts-phoneme-timeline-b64` response headers.
134
+ If your TTS endpoint is called cross-origin, the server must expose those headers with `Access-Control-Expose-Headers`.
135
+ If your TTS service uses a different header such as `x-api-key`, set `VITE_TTS_API_KEY_HEADER=x-api-key` and leave the prefix empty.
113
136
  5. Optional TTS debug logging
114
137
 
115
138
  To inspect TTS queue and playback lifecycle in browser console, set `VITE_DEBUG_TTS=true` in your `.env` file.
@@ -307,7 +330,22 @@ For responses with buttons:
307
330
 
308
331
  The widget now includes a built-in microphone button in the input area. Clicking the button toggles voice mode, requests microphone access, records speech, sends the recorded audio to `VITE_STT_ENDPOINT`, and forwards the returned transcription as a regular user message.
309
332
 
310
- During TTS playback, the header avatar can switch between phoneme images using the `visemes` payload returned by the TTS service. The widget currently supports these phoneme IDs:
333
+ If `VITE_STT_API_KEY` is set, the STT request also includes the configured auth header. For `https://voice.netmonk-ai.tech/stt`, use `x-api-key` without any prefix. Other providers can still use `Authorization: Bearer <key>` or any custom header via `.env`.
334
+
335
+ **Expected STT request/response shape:**
336
+
337
+ - Request body: `multipart/form-data`
338
+ - Audio field name: `audio`
339
+ - Request headers: `Accept: application/json`, plus `x-api-key: <key>` when configured for the Netmonk STT service
340
+ - Expected JSON response:
341
+
342
+ ```json
343
+ {
344
+ "text": "Halo, saya mau tanya status tiket saya"
345
+ }
346
+ ```
347
+
348
+ During TTS playback, the header avatar can switch between phoneme images using the metadata returned by the TTS service. The widget currently supports these phoneme IDs:
311
349
 
312
350
  - `A`
313
351
  - `BP`
@@ -324,22 +362,65 @@ During TTS playback, the header avatar can switch between phoneme images using t
324
362
 
325
363
  Phoneme IDs are normalized case-insensitively in the widget, so values such as `ChJ` and `CHJ` resolve to the same avatar image.
326
364
 
327
- **Expected TTS response shape:**
365
+ **Preferred webhook TTS asset shape (MinIO-first):**
328
366
 
329
367
  ```json
330
368
  {
331
- "audioBase64": "<base64-audio>",
332
- "contentType": "audio/mpeg",
333
- "durationMs": 1450,
334
- "visemes": [
335
- { "id": "M", "startMs": 0, "endMs": 120 },
336
- { "id": "A", "startMs": 121, "endMs": 260 },
337
- { "id": "SZ", "startMs": 261, "endMs": 420 }
338
- ]
369
+ "messages": [{ "type": "text", "text": "Halo, ada yang bisa saya bantu?" }],
370
+ "tts_assets": {
371
+ "items": [
372
+ {
373
+ "bucket": "chatbot-tts",
374
+ "response_index": 1,
375
+ "audio_object_key": "tts/...wav",
376
+ "phoneme_object_key": "tts/...phonemes.txt",
377
+ "phoneme_timeline_object_key": "tts/...phoneme-timeline.json",
378
+ "viseme_timeline_object_key": "tts/...viseme-timeline.json"
379
+ }
380
+ ]
381
+ }
339
382
  }
340
383
  ```
341
384
 
342
- If `visemes` are omitted, TTS audio can still play normally, but the avatar will not switch phoneme frames dynamically.
385
+ When this payload is available, the widget resolves MinIO object keys first for audio and timeline metadata. If no usable asset is found, it falls back to `VITE_TTS_ENDPOINT`.
386
+
387
+ **Expected TTS response shape:**
388
+
389
+ ```text
390
+ body: <raw WAV or other audio binary>
391
+ content-type: audio/wav
392
+ x-tts-visemes-b64: W3siaWQiOiJNIiwic3RhcnRNcyI6MCwiZW5kTXMiOjEyMH1d
393
+ x-tts-phonemes-b64: TSBBCg==
394
+ x-tts-phoneme-timeline-b64: W3sicGhvbmVtZSI6Ik0iLCJzdGFydE1zIjowLCJlbmRNcyI6MTIwfV0=
395
+ ```
396
+
397
+ `x-tts-phonemes-b64` is the legacy phoneme string, while `x-tts-phoneme-timeline-b64` is the timed phoneme timeline. The base64 headers should decode as:
398
+
399
+ Decoded `x-tts-phonemes-b64`:
400
+
401
+ ```text
402
+ M A
403
+ ```
404
+
405
+ Decoded `x-tts-phoneme-timeline-b64`:
406
+
407
+ ```json
408
+ [
409
+ { "phoneme": "M", "startMs": 0, "endMs": 120 }
410
+ ]
411
+ ```
412
+
413
+ Decoded `x-tts-visemes-b64`:
414
+
415
+ ```json
416
+ [
417
+ { "id": "M", "startMs": 0, "endMs": 120 },
418
+ { "id": "A", "startMs": 121, "endMs": 260 },
419
+ { "id": "SZ", "startMs": 261, "endMs": 420 }
420
+ ]
421
+ ```
422
+
423
+ The widget still supports the older JSON body format as a fallback, but header-based metadata is now the preferred format for binary audio responses.
343
424
 
344
425
  ---
345
426