@spatialwalk/avatarkit 1.0.0-beta.10 → 1.0.0-beta.101

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (98) hide show
  1. package/CHANGELOG.md +771 -4
  2. package/README.md +676 -365
  3. package/dist/StreamingAudioPlayer-BULgPjpe.js +643 -0
  4. package/dist/avatar_core_wasm-CQbUl6zN.js +2696 -0
  5. package/dist/avatar_core_wasm-bd762669.wasm +0 -0
  6. package/dist/core/Avatar.d.ts +5 -7
  7. package/dist/core/AvatarController.d.ts +99 -60
  8. package/dist/core/AvatarManager.d.ts +32 -12
  9. package/dist/core/AvatarSDK.d.ts +58 -0
  10. package/dist/core/AvatarView.d.ts +136 -128
  11. package/dist/index-C0A1HA8M.js +18427 -0
  12. package/dist/index.d.ts +2 -4
  13. package/dist/index.js +17 -17
  14. package/dist/next.d.ts +2 -0
  15. package/dist/performance/FrameRateMonitor.d.ts +85 -0
  16. package/dist/types/character-settings.d.ts +7 -1
  17. package/dist/types/character.d.ts +42 -16
  18. package/dist/types/index.d.ts +165 -45
  19. package/dist/vite.d.ts +19 -0
  20. package/next.d.ts +3 -0
  21. package/next.js +187 -0
  22. package/package.json +38 -8
  23. package/vite.d.ts +20 -0
  24. package/vite.js +126 -0
  25. package/dist/StreamingAudioPlayer-Bq2-bQiT.js +0 -319
  26. package/dist/StreamingAudioPlayer-Bq2-bQiT.js.map +0 -1
  27. package/dist/animation/AnimationWebSocketClient.d.ts +0 -50
  28. package/dist/animation/AnimationWebSocketClient.d.ts.map +0 -1
  29. package/dist/animation/utils/eventEmitter.d.ts +0 -13
  30. package/dist/animation/utils/eventEmitter.d.ts.map +0 -1
  31. package/dist/animation/utils/flameConverter.d.ts +0 -26
  32. package/dist/animation/utils/flameConverter.d.ts.map +0 -1
  33. package/dist/audio/AnimationPlayer.d.ts +0 -57
  34. package/dist/audio/AnimationPlayer.d.ts.map +0 -1
  35. package/dist/audio/StreamingAudioPlayer.d.ts +0 -123
  36. package/dist/audio/StreamingAudioPlayer.d.ts.map +0 -1
  37. package/dist/avatar_core_wasm-D4eEi7Eh.js +0 -1666
  38. package/dist/avatar_core_wasm-D4eEi7Eh.js.map +0 -1
  39. package/dist/avatar_core_wasm.wasm +0 -0
  40. package/dist/config/app-config.d.ts +0 -44
  41. package/dist/config/app-config.d.ts.map +0 -1
  42. package/dist/config/constants.d.ts +0 -29
  43. package/dist/config/constants.d.ts.map +0 -1
  44. package/dist/config/sdk-config-loader.d.ts +0 -12
  45. package/dist/config/sdk-config-loader.d.ts.map +0 -1
  46. package/dist/core/Avatar.d.ts.map +0 -1
  47. package/dist/core/AvatarController.d.ts.map +0 -1
  48. package/dist/core/AvatarDownloader.d.ts +0 -95
  49. package/dist/core/AvatarDownloader.d.ts.map +0 -1
  50. package/dist/core/AvatarKit.d.ts +0 -48
  51. package/dist/core/AvatarKit.d.ts.map +0 -1
  52. package/dist/core/AvatarManager.d.ts.map +0 -1
  53. package/dist/core/AvatarView.d.ts.map +0 -1
  54. package/dist/core/NetworkLayer.d.ts +0 -59
  55. package/dist/core/NetworkLayer.d.ts.map +0 -1
  56. package/dist/generated/driveningress/v1/driveningress.d.ts +0 -80
  57. package/dist/generated/driveningress/v1/driveningress.d.ts.map +0 -1
  58. package/dist/generated/driveningress/v2/driveningress.d.ts +0 -81
  59. package/dist/generated/driveningress/v2/driveningress.d.ts.map +0 -1
  60. package/dist/generated/google/protobuf/struct.d.ts +0 -108
  61. package/dist/generated/google/protobuf/struct.d.ts.map +0 -1
  62. package/dist/generated/google/protobuf/timestamp.d.ts +0 -129
  63. package/dist/generated/google/protobuf/timestamp.d.ts.map +0 -1
  64. package/dist/index-bQnEVIkT.js +0 -5999
  65. package/dist/index-bQnEVIkT.js.map +0 -1
  66. package/dist/index.d.ts.map +0 -1
  67. package/dist/index.js.map +0 -1
  68. package/dist/renderer/RenderSystem.d.ts +0 -79
  69. package/dist/renderer/RenderSystem.d.ts.map +0 -1
  70. package/dist/renderer/covariance.d.ts +0 -13
  71. package/dist/renderer/covariance.d.ts.map +0 -1
  72. package/dist/renderer/renderer.d.ts +0 -8
  73. package/dist/renderer/renderer.d.ts.map +0 -1
  74. package/dist/renderer/sortSplats.d.ts +0 -12
  75. package/dist/renderer/sortSplats.d.ts.map +0 -1
  76. package/dist/renderer/webgl/reorderData.d.ts +0 -14
  77. package/dist/renderer/webgl/reorderData.d.ts.map +0 -1
  78. package/dist/renderer/webgl/webglRenderer.d.ts +0 -66
  79. package/dist/renderer/webgl/webglRenderer.d.ts.map +0 -1
  80. package/dist/renderer/webgpu/webgpuRenderer.d.ts +0 -54
  81. package/dist/renderer/webgpu/webgpuRenderer.d.ts.map +0 -1
  82. package/dist/types/character-settings.d.ts.map +0 -1
  83. package/dist/types/character.d.ts.map +0 -1
  84. package/dist/types/index.d.ts.map +0 -1
  85. package/dist/utils/animation-interpolation.d.ts +0 -17
  86. package/dist/utils/animation-interpolation.d.ts.map +0 -1
  87. package/dist/utils/cls-tracker.d.ts +0 -17
  88. package/dist/utils/cls-tracker.d.ts.map +0 -1
  89. package/dist/utils/error-utils.d.ts +0 -27
  90. package/dist/utils/error-utils.d.ts.map +0 -1
  91. package/dist/utils/logger.d.ts +0 -35
  92. package/dist/utils/logger.d.ts.map +0 -1
  93. package/dist/utils/reqId.d.ts +0 -20
  94. package/dist/utils/reqId.d.ts.map +0 -1
  95. package/dist/wasm/avatarCoreAdapter.d.ts +0 -188
  96. package/dist/wasm/avatarCoreAdapter.d.ts.map +0 -1
  97. package/dist/wasm/avatarCoreMemory.d.ts +0 -141
  98. package/dist/wasm/avatarCoreMemory.d.ts.map +0 -1
package/README.md CHANGED
@@ -1,13 +1,12 @@
1
- # SPAvatarKit SDK
1
+ # AvatarKit SDK
2
2
 
3
- Real-time virtual avatar rendering SDK based on 3D Gaussian Splatting, supporting audio-driven animation rendering and high-quality 3D rendering.
3
+ Real-time virtual avatar rendering SDK for Web, supporting audio-driven animation and high-quality 3D rendering.
4
4
 
5
5
  ## 🚀 Features
6
6
 
7
- - **3D Gaussian Splatting Rendering** - Based on the latest point cloud rendering technology, providing high-quality 3D virtual avatars
8
- - **Audio-Driven Real-Time Animation Rendering** - Users provide audio data, SDK handles receiving animation data and rendering
9
- - **WebGPU/WebGL Dual Rendering Backend** - Automatically selects the best rendering backend for compatibility
10
- - **WASM High-Performance Computing** - Uses C++ compiled WebAssembly modules for geometric calculations
7
+ - **High-Quality 3D Rendering** - GPU-accelerated avatar rendering with automatic backend selection
8
+ - **Audio-Driven Real-Time Animation** - Send audio data, SDK handles animation and rendering
9
+ - **Multi-Avatar Support** - Support multiple avatar instances simultaneously, each with independent state and rendering
11
10
  - **TypeScript Support** - Complete type definitions and IntelliSense
12
11
  - **Modular Architecture** - Clear component separation, easy to integrate and extend
13
12
 
@@ -17,369 +16,736 @@ Real-time virtual avatar rendering SDK based on 3D Gaussian Splatting, supportin
17
16
  npm install @spatialwalk/avatarkit
18
17
  ```
19
18
 
19
+ ## 🚧 Release Gate (Hard Rule)
20
+
21
+ Release must pass gates before publish. Do not publish by manual ad-hoc commands.
22
+
23
+ Required gate checks:
24
+
25
+ ```bash
26
+ pnpm typecheck
27
+ pnpm test
28
+ pnpm build
29
+ ./tools/check_perf_baseline_release_gate.sh
30
+ ```
31
+
32
+ If iteration includes bugfixes, `docs/bugfix-history.md` must have completed rows (test mapping + red/green evidence).
33
+
34
+ Hotfix bypass is allowed only for emergency and must be recorded:
35
+
36
+ ```bash
37
+ HOTFIX_BYPASS=1 ./tools/check_perf_baseline_release_gate.sh
38
+ ```
39
+
40
+ ## 🧪 Benchmark Demo (Web SDK)
41
+
42
+ Use the dedicated benchmark demo (independent from `vanilla/`) for perf/render baseline runs:
43
+
44
+ ```bash
45
+ pnpm demo:benchmark
46
+ ```
47
+
48
+ ## 🚀 Demo Repository
49
+
50
+ <div align="center">
51
+
52
+ ### 📌 **Quick Start: Check Out Our Demo Repository**
53
+
54
+ We provide complete example code and best practices to help you quickly integrate the SDK.
55
+
56
+ **The demo repository includes:**
57
+ - ✅ Complete integration examples
58
+ - ✅ Usage examples for both SDK mode and Host mode
59
+ - ✅ Audio processing examples (PCM16, WAV, MP3, etc.)
60
+ - ✅ Vite configuration examples
61
+ - ✅ Next.js configuration examples
62
+ - ✅ Best practices for common scenarios
63
+
64
+ **[👉 View Demo Repository](https://github.com/spatialwalk/avatarkit-demo)** | *If not yet created, please contact the team*
65
+
66
+ </div>
67
+
68
+ ---
69
+
70
+ ## 🔧 Vite Configuration (Recommended)
71
+
72
+ If you are using Vite as your build tool, we strongly recommend using our Vite plugin to automatically handle WASM file configuration. The plugin automatically handles all necessary configurations, so you don't need to set them up manually.
73
+
74
+ ### Using the Plugin
75
+
76
+ Add the plugin to `vite.config.ts`:
77
+
78
+ ```typescript
79
+ import { defineConfig } from 'vite'
80
+ import { avatarkitVitePlugin } from '@spatialwalk/avatarkit/vite'
81
+
82
+ export default defineConfig({
83
+ plugins: [
84
+ avatarkitVitePlugin(), // Just add this line
85
+ ],
86
+ })
87
+ ```
88
+
89
+ ### Plugin Features
90
+
91
+ The plugin automatically handles:
92
+
93
+ - ✅ **Development Server**: Automatically sets the correct MIME type (`application/wasm`) for WASM files
94
+ - ✅ **Build Time**: Automatically copies WASM files to `dist/assets/` directory
95
+ - ✅ **Cloudflare Pages**: Automatically generates `_headers` file to ensure WASM files use the correct MIME type
96
+ - ✅ **Vite Configuration**: Automatically configures `optimizeDeps`, `assetsInclude`, `assetsInlineLimit`, and other options
97
+
98
+ ### Manual Configuration (Without Plugin)
99
+
100
+ If you don't use the Vite plugin, you need to manually configure the following:
101
+
102
+ ```typescript
103
+ // vite.config.ts
104
+ export default defineConfig({
105
+ optimizeDeps: {
106
+ exclude: ['@spatialwalk/avatarkit'],
107
+ },
108
+ assetsInclude: ['**/*.wasm'],
109
+ build: {
110
+ assetsInlineLimit: 0,
111
+ rollupOptions: {
112
+ output: {
113
+ assetFileNames: (assetInfo) => {
114
+ if (assetInfo.name?.endsWith('.wasm')) {
115
+ return 'assets/[name][extname]'
116
+ }
117
+ return 'assets/[name]-[hash][extname]'
118
+ },
119
+ },
120
+ },
121
+ },
122
+ // Development server needs to manually configure middleware to set WASM MIME type
123
+ configureServer(server) {
124
+ server.middlewares.use((req, res, next) => {
125
+ if (req.url?.endsWith('.wasm')) {
126
+ res.setHeader('Content-Type', 'application/wasm')
127
+ }
128
+ next()
129
+ })
130
+ },
131
+ })
132
+ ```
133
+
134
+ ## 🔧 Next.js Configuration
135
+
136
+ For Next.js projects, use the `withAvatarkit` wrapper to automatically handle WASM file configuration with webpack.
137
+
138
+ ### Using the Plugin
139
+
140
+ Wrap your Next.js config in `next.config.mjs`:
141
+
142
+ ```javascript
143
+ import { withAvatarkit } from '@spatialwalk/avatarkit/next'
144
+
145
+ export default withAvatarkit({
146
+ // ...your existing Next.js config
147
+ })
148
+ ```
149
+
150
+ ### Plugin Features
151
+
152
+ The plugin automatically handles:
153
+
154
+ - ✅ **Path Fix**: Patches asset path resolution so WASM files are correctly loaded at `/_next/static/chunks/`
155
+ - ✅ **WASM Copying**: Copies `.wasm` files into `static/chunks/` via a custom webpack plugin (client build only)
156
+ - ✅ **Content-Type Headers**: Adds `application/wasm` response header for `/_next/static/chunks/*.wasm`
157
+ - ✅ **Config Chaining**: Preserves your existing `webpack` and `headers` configurations
158
+
159
+ ## 🔐 Authentication
160
+
161
+ All environments require an **App ID** and **Session Token** for authentication.
162
+
163
+ ### App ID
164
+
165
+ The App ID is used to identify your application. You can obtain your App ID by:
166
+
167
+ 1. **For Testing**: Use the default test App ID provided in demo repositories (paired with test Session Token, only works with publicly available test avatars like Rohan, Dr.Kellan, Priya, Josh, etc.)
168
+ 2. **For Production**: Visit the [Developer Platform](https://dash.spatialreal.ai) to create your own App and avatars. You will receive your own App ID after creating an App.
169
+
170
+ ### Session Token
171
+
172
+ The Session Token is required for authentication and must be obtained from your SDK provider.
173
+
174
+ **⚠️ Important Notes:**
175
+ - The Session Token must be valid and not expired
176
+ - In production applications, you **must** manually inject a valid Session Token obtained from your SDK provider
177
+ - The default Session Token provided in demo repositories is **only for demonstration purposes** and can only be used with test avatars
178
+ - If you want to create your own avatars and test them, please visit the [Developer Platform](https://dash.spatialreal.ai) to create your own App and generate Session Tokens
179
+
180
+ **How to Set Session Token:**
181
+
182
+ ```typescript
183
+ // Initialize SDK with App ID
184
+ await AvatarSDK.initialize('your-app-id', configuration)
185
+
186
+ // Set Session Token (can be called before or after initialization)
187
+ // If called before initialization, the token will be automatically set when you initialize the SDK
188
+ AvatarSDK.setSessionToken('your-session-token')
189
+
190
+ // Get current Session Token
191
+ const sessionToken = AvatarSDK.sessionToken
192
+ ```
193
+
194
+ **Token Management:**
195
+ - The Session Token can be set at any time using `AvatarSDK.setSessionToken(token)`
196
+ - If you set the token before initializing the SDK, it will be automatically applied during initialization
197
+ - If you set the token after initialization, it will be applied immediately
198
+ - Handle token refresh logic in your application as needed (e.g., when token expires)
199
+
200
+ **For Production Integration:**
201
+ - Obtain a valid Session Token from your SDK provider
202
+ - Store the token securely (never expose it in client-side code if possible)
203
+ - Implement token refresh logic to handle token expiration
204
+ - Use `AvatarSDK.setSessionToken(token)` to inject the token programmatically
205
+
20
206
  ## 🎯 Quick Start
21
207
 
208
+ ### ⚠️ Important: Audio Context Initialization
209
+
210
+ **Before using any audio-related features, you MUST initialize the audio context in a user gesture context** (e.g., `click`, `touchstart` event handlers). This is required by browser security policies. Calling `initializeAudioContext()` outside a user gesture will fail.
211
+
22
212
  ### Basic Usage
23
213
 
24
214
  ```typescript
25
215
  import {
26
- AvatarKit,
216
+ AvatarSDK,
27
217
  AvatarManager,
28
218
  AvatarView,
29
219
  Configuration,
30
- Environment
220
+ Environment,
221
+ DrivingServiceMode,
222
+ LogLevel
31
223
  } from '@spatialwalk/avatarkit'
32
224
 
33
225
  // 1. Initialize SDK
226
+
34
227
  const configuration: Configuration = {
35
- environment: Environment.test,
228
+ environment: Environment.cn,
229
+ drivingServiceMode: DrivingServiceMode.sdk, // Optional, 'sdk' is default
230
+ // - DrivingServiceMode.sdk: SDK mode - SDK handles network communication
231
+ // - DrivingServiceMode.host: Host mode - Host app provides audio and animation data
232
+ logLevel: LogLevel.off, // Optional, 'off' is default
233
+ // - LogLevel.off: Disable all logs
234
+ // - LogLevel.error: Only error logs
235
+ // - LogLevel.warning: Warning and error logs
236
+ // - LogLevel.all: All logs (info, warning, error)
237
+ audioFormat: { // Default is { channelCount: 1, sampleRate: 16000 }
238
+ channelCount: 1, // Fixed to 1 (mono)
239
+ sampleRate: 16000 // Supported: 8000, 16000, 22050, 24000, 32000, 44100, 48000 Hz
240
+ // ⚠️ Must match your actual audio sample rate. Mismatched sample rate will cause playback issues.
241
+ }
242
+ // characterApiBaseUrl: 'https://custom-api.example.com' // Optional, internal debug config, can be ignored
36
243
  }
37
244
 
38
- await AvatarKit.initialize('your-app-id', configuration)
245
+ await AvatarSDK.initialize('your-app-id', configuration)
39
246
 
40
- // Set sessionToken (if needed, call separately)
41
- // AvatarKit.setSessionToken('your-session-token')
247
+ // Set Session Token (required for authentication)
248
+ // You must obtain a valid Session Token from your SDK provider
249
+ // See Authentication section above for more details
250
+ AvatarSDK.setSessionToken('your-session-token')
42
251
 
43
- // 2. Load character
44
- const avatarManager = new AvatarManager()
252
+ // 2. Load avatar
253
+ const avatarManager = AvatarManager.shared
45
254
  const avatar = await avatarManager.load('character-id', (progress) => {
46
255
  console.log(`Loading progress: ${progress.progress}%`)
47
256
  })
48
257
 
49
258
  // 3. Create view (automatically creates Canvas and AvatarController)
50
- // Network mode (default)
259
+ // The playback mode is determined by drivingServiceMode in AvatarSDK configuration
260
+ // - DrivingServiceMode.sdk: SDK mode - SDK handles network communication
261
+ // - DrivingServiceMode.host: Host mode - Host app provides audio and animation data
51
262
  const container = document.getElementById('avatar-container')
52
- const avatarView = new AvatarView(avatar, {
53
- container: container,
54
- playbackMode: 'network' // Optional, 'network' is default
263
+ const avatarView = new AvatarView(avatar, container)
264
+
265
+ // 4. ⚠️ CRITICAL: Initialize audio context (MUST be called in user gesture context)
266
+ // This method MUST be called within a user gesture event handler (click, touchstart, etc.)
267
+ // to satisfy browser security policies. Calling it outside a user gesture will fail.
268
+ button.addEventListener('click', async () => {
269
+ // Initialize audio context - MUST be in user gesture context
270
+ await avatarView.controller.initializeAudioContext()
271
+
272
+ // 5. Start real-time communication (SDK mode only)
273
+ // Note: start() initiates the WebSocket connection asynchronously.
274
+ // Wait for onConnectionState === 'connected' before calling send().
275
+ await avatarView.controller.start()
276
+
277
+ // 6. Wait for connection to be ready
278
+ await new Promise<void>((resolve) => {
279
+ avatarView.controller.onConnectionState = (state) => {
280
+ if (state === ConnectionState.connected) resolve()
281
+ }
282
+ })
283
+
284
+ // 7. Send audio data (SDK mode, must be mono PCM16 format matching configured sample rate)
285
+ // audioData: ArrayBuffer or Uint8Array containing PCM16 (S16LE) audio samples
286
+ // ⚠️ Byte length MUST be even (2 bytes per sample). Odd-length data will cause server-side
287
+ // validation error and WebSocket disconnect.
288
+ // - PCM files: Can be directly read as ArrayBuffer
289
+ // - WAV files: Extract PCM data from WAV format (may require resampling)
290
+ // - MP3 files: Decode first (e.g., using AudioContext.decodeAudioData()), then convert to PCM16
291
+ const audioData = new ArrayBuffer(1024) // Placeholder: Replace with actual PCM16 audio data
292
+ avatarView.controller.send(audioData, false) // Send audio data
293
+ avatarView.controller.send(audioData, true) // end=true marks the end of current conversation round
55
294
  })
56
-
57
- // 4. Start real-time communication (network mode only)
58
- await avatarView.avatarController.start()
59
-
60
- // 5. Send audio data (network mode)
61
- // ⚠️ Important: Audio must be 16kHz mono PCM16 format
62
- // If audio is Uint8Array, you can use slice().buffer to convert to ArrayBuffer
63
- const audioUint8 = new Uint8Array(1024) // Example: 16kHz PCM16 audio data (512 samples = 1024 bytes)
64
- const audioData = audioUint8.slice().buffer // Simplified conversion, works for ArrayBuffer and SharedArrayBuffer
65
- avatarView.avatarController.send(audioData, false) // Send audio data, will automatically start playing after accumulating enough data
66
- avatarView.avatarController.send(audioData, true) // end=true means immediately return animation data, no longer accumulating
67
295
  ```
68
296
 
69
- ### External Data Mode Example
297
+ ### Host Mode Example
70
298
 
71
299
  ```typescript
72
- import { AvatarPlaybackMode } from '@spatialwalk/avatarkit'
73
300
 
74
- // 1-3. Same as network mode (initialize SDK, load character)
301
+ // 1-3. Same as SDK mode (initialize SDK, load avatar)
75
302
 
76
- // 3. Create view with external data mode
303
+ // 3. Create view with Host mode
77
304
  const container = document.getElementById('avatar-container')
78
- const avatarView = new AvatarView(avatar, {
79
- container: container,
80
- playbackMode: AvatarPlaybackMode.external
81
- })
82
-
83
- // 4. Start playback with initial data (obtained from your service)
84
- // Note: Audio and animation data should be obtained from your backend service
85
- const initialAudioChunks = [{ data: audioData1, isLast: false }, { data: audioData2, isLast: false }]
86
- const initialKeyframes = animationData1 // Animation keyframes from your service
87
-
88
- await avatarView.avatarController.play(initialAudioChunks, initialKeyframes)
89
-
90
- // 5. Stream additional data as needed
91
- avatarView.avatarController.sendAudioChunk(audioData3, false)
92
- avatarView.avatarController.sendKeyframes(animationData2)
305
+ const avatarView = new AvatarView(avatar, container)
306
+
307
+ // 4. ⚠️ CRITICAL: Initialize audio context (MUST be called in user gesture context)
308
+ // This method MUST be called within a user gesture event handler (click, touchstart, etc.)
309
+ // to satisfy browser security policies. Calling it outside a user gesture will fail.
310
+ button.addEventListener('click', async () => {
311
+ // Initialize audio context - MUST be in user gesture context
312
+ await avatarView.controller.initializeAudioContext()
313
+
314
+ // 5. Host Mode Workflow:
315
+ // Send audio data first to get conversationId, then use it to send animation data
316
+ const conversationId = avatarView.controller.yieldAudioData(audioData, false)
317
+ avatarView.controller.yieldFramesData(animationDataArray, conversationId) // animationDataArray: (Uint8Array | ArrayBuffer)[]
93
318
  ```
94
319
 
95
320
  ### Complete Examples
96
321
 
97
- Check the example code in the GitHub repository for complete usage flows for both modes.
98
-
99
- **Example Project:** [Avatarkit-web-demo](https://github.com/spatialwalk/Avatarkit-web-demo)
100
-
101
- This repository contains complete examples for Vanilla JS, Vue 3, and React, demonstrating:
102
- - Network mode: Real-time audio input with automatic animation data reception
103
- - External data mode: Custom data sources with manual audio/animation data management
322
+ This SDK supports two usage modes:
323
+ - SDK mode: Real-time audio input with automatic animation data reception
324
+ - Host mode: Custom data sources with manual audio/animation data management
104
325
 
105
326
  ## 🏗️ Architecture Overview
106
327
 
107
- ### Three-Layer Architecture
108
-
109
- The SDK uses a three-layer architecture for clear separation of concerns:
110
-
111
- 1. **Rendering Layer (AvatarView)** - Responsible for 3D rendering only
112
- 2. **Playback Layer (AvatarController)** - Manages audio/animation synchronization and playback
113
- 3. **Network Layer (NetworkLayer)** - Handles WebSocket communication (only in network mode)
114
-
115
328
  ### Core Components
116
329
 
117
- - **AvatarKit** - SDK initialization and management
118
- - **AvatarManager** - Character resource loading and management
119
- - **AvatarView** - 3D rendering view (rendering layer)
120
- - **AvatarController** - Audio/animation playback controller (playback layer)
121
- - **NetworkLayer** - WebSocket communication (network layer, automatically composed in network mode)
122
- - **AvatarCoreAdapter** - WASM module adapter
330
+ - **AvatarSDK** - SDK initialization and management
331
+ - **AvatarManager** - Avatar resource loading and management
332
+ - **AvatarView** - 3D rendering view
333
+ - **AvatarController** - Audio/animation playback controller
123
334
 
124
335
  ### Playback Modes
125
336
 
126
- The SDK supports two playback modes, configured when creating `AvatarView`:
337
+ The SDK supports two playback modes, configured in `AvatarSDK.initialize()`:
127
338
 
128
- #### 1. Network Mode (Default)
129
- - SDK handles WebSocket communication automatically
339
+ #### 1. SDK Mode (Default)
340
+ - Configured via `drivingServiceMode: DrivingServiceMode.sdk` in `AvatarSDK.initialize()`
341
+ - SDK handles network communication automatically
130
342
  - Send audio data via `AvatarController.send()`
131
343
  - SDK receives animation data from backend and synchronizes playback
132
344
  - Best for: Real-time audio input scenarios
133
345
 
134
- #### 2. External Data Mode
135
- - External components manage their own network/data fetching
136
- - External components provide both audio and animation data
346
+ #### 2. Host Mode
347
+ - Configured via `drivingServiceMode: DrivingServiceMode.host` in `AvatarSDK.initialize()`
348
+ - Host application manages its own network/data fetching
349
+ - Host application provides both audio and animation data
137
350
  - SDK only handles synchronized playback
138
351
  - Best for: Custom data sources, pre-recorded content, or custom network implementations
139
352
 
353
+ **Note:** The playback mode is determined by `drivingServiceMode` in `AvatarSDK.initialize()` configuration.
354
+
355
+ ### Fallback Mechanism
356
+
357
+ The SDK includes a fallback mechanism to ensure audio playback continues even when animation data is unavailable:
358
+
359
+ - **SDK Mode Connection Failure**: If connection fails to establish within 15 seconds, the SDK automatically enters fallback mode. Audio data can still be sent and will play normally, even though no animation data will be received. This ensures audio playback is not interrupted.
360
+ - **SDK Mode Server Error**: If the server returns an error after connection is established, the SDK automatically enters audio-only mode for that session.
361
+ - **Host Mode**: If empty animation data is provided (empty array or undefined), the SDK automatically enters audio-only mode.
362
+ - Once in audio-only mode, any subsequent animation data for that session will be ignored, and only audio will continue playing.
363
+ - The fallback mode is interruptible, just like normal playback mode.
364
+ - Connection state callbacks (`onConnectionState`) will notify you when connection fails or times out.
365
+
140
366
  ### Data Flow
141
367
 
142
- #### Network Mode Flow
368
+ #### SDK Mode Flow
143
369
 
144
370
  ```
145
- User audio input (16kHz mono PCM16)
371
+ Audio input (PCM16 mono)
146
372
 
147
- AvatarController.send()
373
+ AvatarController.send()
148
374
 
149
- NetworkLayer WebSocket Backend processing
375
+ Backend processingAnimation data
150
376
 
151
- Backend returns animation data (FLAME keyframes)
377
+ SDK synchronizes audio + animation playback
152
378
 
153
- NetworkLayer AvatarController AnimationPlayer
154
-
155
- FLAME parameters → AvatarCore.computeFrameFlatFromParams() → Splat data
156
-
157
- AvatarController (playback loop) → AvatarView.renderRealtimeFrame()
158
-
159
- RenderSystem → WebGPU/WebGL → Canvas rendering
379
+ GPU renderingCanvas
160
380
  ```
161
381
 
162
- #### External Data Mode Flow
382
+ #### Host Mode Flow
163
383
 
164
384
  ```
165
385
  External data source (audio + animation)
166
386
 
167
- AvatarController.play(initialAudio, initialKeyframes) // Start playback
168
-
169
- AvatarController.sendAudioChunk() // Stream additional audio
170
- AvatarController.sendKeyframes() // Stream additional animation
171
-
172
- AvatarController → AnimationPlayer (synchronized playback)
173
-
174
- FLAME parameters → AvatarCore.computeFrameFlatFromParams() → Splat data
387
+ AvatarController.yieldAudioData(audioChunk) returns conversationId
388
+ AvatarController.yieldFramesData(dataArray, conversationId)
175
389
 
176
- AvatarController (playback loop) AvatarView.renderRealtimeFrame()
390
+ SDK synchronizes audio + animation playback
177
391
 
178
- RenderSystem WebGPU/WebGL → Canvas rendering
392
+ GPU rendering → Canvas
179
393
  ```
180
394
 
181
- **Note:**
182
- - In network mode, users provide audio data, SDK handles network communication and animation data reception
183
- - In external data mode, users provide both audio and animation data, SDK handles synchronized playback only
184
-
185
395
  ### Audio Format Requirements
186
396
 
187
- **⚠️ Important:** The SDK requires audio data to be in **16kHz mono PCM16** format:
397
+ **⚠️ Important:** The SDK requires audio data to be in **mono PCM16** format:
188
398
 
189
- - **Sample Rate**: 16kHz (16000 Hz) - This is a backend requirement
190
- - **Channels**: Mono (single channel)
399
+ - **Sample Rate**: Configurable via `audioFormat.sampleRate` in SDK initialization (default: 16000 Hz)
400
+ - Supported sample rates: 8000, 16000, 22050, 24000, 32000, 44100, 48000 Hz
401
+ - The configured sample rate will be used for both audio recording and playback
402
+ - **Channels**: Mono (single channel) - Fixed to 1 channel
191
403
  - **Format**: PCM16 (16-bit signed integer, little-endian)
192
404
  - **Byte Order**: Little-endian
193
405
 
194
406
  **Audio Data Format:**
195
- - Each sample is 2 bytes (16-bit)
407
+ - Each sample is 2 bytes (16-bit signed integer, little-endian)
196
408
  - Audio data should be provided as `ArrayBuffer` or `Uint8Array`
197
- - For example: 1 second of audio = 16000 samples × 2 bytes = 32000 bytes
409
+ - For example, with 16kHz sample rate: 1 second of audio = 16000 samples × 2 bytes = 32000 bytes
410
+ - For 48kHz sample rate: 1 second of audio = 48000 samples × 2 bytes = 96000 bytes
411
+
412
+ **Audio Data Source:**
413
+ The `audioData` parameter represents raw PCM16 audio samples in the configured sample rate and mono format. Common audio sources include:
414
+ - **PCM files**: Raw PCM16 files can be directly read as `ArrayBuffer` or `Uint8Array` and sent to the SDK (ensure sample rate matches configuration)
415
+ - **WAV files**: WAV files contain PCM16 audio data in their data chunk. After extracting the PCM data from the WAV file format, it can be sent to the SDK (may require resampling if sample rate differs)
416
+ - **MP3 files**: MP3 files need to be decoded first (e.g., using `AudioContext.decodeAudioData()` or a decoder library), then converted from the decoded format to PCM16 before sending to the SDK
417
+ - **Microphone input**: Real-time microphone audio needs to be captured and converted to PCM16 format at the configured sample rate before sending
418
+ - **Other audio sources**: Any audio source must be converted to mono PCM16 format at the configured sample rate before sending
419
+
420
+ **Example: Processing WAV and MP3 Files:**
421
+ ```typescript
422
+ // WAV file processing
423
+ async function processWAVFile(wavFile: File): Promise<ArrayBuffer> {
424
+ const arrayBuffer = await wavFile.arrayBuffer()
425
+ const view = new DataView(arrayBuffer)
426
+
427
+ // WAV format: Skip header (usually 44 bytes for standard WAV)
428
+ // Check RIFF header
429
+ if (view.getUint32(0, true) !== 0x46464952) { // "RIFF"
430
+ throw new Error('Invalid WAV file')
431
+ }
432
+
433
+ // Find "data" chunk (offset may vary)
434
+ let dataOffset = 44 // Standard WAV header size
435
+ // For non-standard WAV files, you may need to search for "data" chunk
436
+ // This is a simplified example - production code should parse chunks properly
437
+
438
+ const pcmData = arrayBuffer.slice(dataOffset)
439
+ return pcmData
440
+ }
441
+
442
+ // MP3 file processing
443
+ async function processMP3File(mp3File: File, targetSampleRate: number): Promise<ArrayBuffer> {
444
+ const arrayBuffer = await mp3File.arrayBuffer()
445
+ const audioContext = new AudioContext({ sampleRate: targetSampleRate })
446
+
447
+ // Decode MP3 to AudioBuffer
448
+ const audioBuffer = await audioContext.decodeAudioData(arrayBuffer.slice(0))
449
+
450
+ // Convert AudioBuffer to PCM16 ArrayBuffer
451
+ const length = audioBuffer.length
452
+ const channels = audioBuffer.numberOfChannels
453
+ const pcm16Buffer = new ArrayBuffer(length * 2)
454
+ const pcm16View = new DataView(pcm16Buffer)
455
+
456
+ // Mix down to mono if stereo
457
+ const sourceData = channels === 1
458
+ ? audioBuffer.getChannelData(0)
459
+ : new Float32Array(length)
460
+
461
+ if (channels > 1) {
462
+ const leftChannel = audioBuffer.getChannelData(0)
463
+ const rightChannel = audioBuffer.getChannelData(1)
464
+ for (let i = 0; i < length; i++) {
465
+ sourceData[i] = (leftChannel[i] + rightChannel[i]) / 2 // Mix to mono
466
+ }
467
+ }
468
+
469
+ // Convert float32 (-1.0 to 1.0) to int16 (-32768 to 32767)
470
+ for (let i = 0; i < length; i++) {
471
+ const sample = Math.max(-1, Math.min(1, sourceData[i])) // Clamp
472
+ const int16Sample = sample < 0 ? sample * 0x8000 : sample * 0x7FFF
473
+ pcm16View.setInt16(i * 2, int16Sample, true) // little-endian
474
+ }
475
+
476
+ audioContext.close()
477
+ return pcm16Buffer
478
+ }
479
+
480
+ // Usage example:
481
+ // const wavPcmData = await processWAVFile(wavFile)
482
+ // avatarView.controller.send(wavPcmData, false)
483
+ //
484
+ // const mp3PcmData = await processMP3File(mp3File, 16000) // 16kHz
485
+ // avatarView.controller.send(mp3PcmData, false)
486
+ ```
198
487
 
199
488
  **Resampling:**
200
- - If your audio source is at a different sample rate (e.g., 24kHz, 48kHz), you must resample it to 16kHz before sending to the SDK
489
+ - If your audio source is at a different sample rate, you must resample it to match the configured sample rate before sending to the SDK
201
490
  - For high-quality resampling, we recommend using Web Audio API's `OfflineAudioContext` with anti-aliasing filtering
202
491
  - See example projects for resampling implementation
203
492
 
493
+ **Configuration Example:**
494
+ ```typescript
495
+ const configuration: Configuration = {
496
+ environment: Environment.cn,
497
+ audioFormat: {
498
+ channelCount: 1, // Fixed to 1 (mono)
499
+ sampleRate: 48000 // Choose from: 8000, 16000, 22050, 24000, 32000, 44100, 48000
500
+ }
501
+ }
502
+ ```
503
+
204
504
  ## 📚 API Reference
205
505
 
206
- ### AvatarKit
506
+ ### AvatarSDK
207
507
 
208
508
  The core management class of the SDK, responsible for initialization and global configuration.
209
509
 
210
510
  ```typescript
211
511
  // Initialize SDK
212
- await AvatarKit.initialize(appId: string, configuration: Configuration)
512
+ await AvatarSDK.initialize(appId: string, configuration: Configuration)
213
513
 
214
514
  // Check initialization status
215
- const isInitialized = AvatarKit.isInitialized
515
+ const isInitialized = AvatarSDK.isInitialized
216
516
 
217
517
  // Get initialized app ID
218
- const appId = AvatarKit.appId
518
+ const appId = AvatarSDK.appId
219
519
 
220
520
  // Get configuration
221
- const config = AvatarKit.configuration
521
+ const config = AvatarSDK.configuration
222
522
 
223
- // Set sessionToken (if needed, call separately)
224
- AvatarKit.setSessionToken('your-session-token')
523
+ // Set Session Token (required for authentication)
524
+ // You must obtain a valid Session Token from your SDK provider
525
+ // See Authentication section for more details
526
+ AvatarSDK.setSessionToken('your-session-token')
225
527
 
226
528
  // Set userId (optional, for telemetry)
227
- AvatarKit.setUserId('user-id')
529
+ AvatarSDK.setUserId('user-id')
228
530
 
229
531
  // Get sessionToken
230
- const sessionToken = AvatarKit.sessionToken
532
+ const sessionToken = AvatarSDK.sessionToken
231
533
 
232
534
  // Get userId
233
- const userId = AvatarKit.userId
535
+ const userId = AvatarSDK.userId
234
536
 
235
537
  // Get SDK version
236
- const version = AvatarKit.version
538
+ const version = AvatarSDK.version
237
539
 
238
540
  // Cleanup resources (must be called when no longer in use)
239
- AvatarKit.cleanup()
541
+ AvatarSDK.cleanup()
240
542
  ```
241
543
 
242
544
  ### AvatarManager
243
545
 
244
- Character resource manager, responsible for downloading, caching, and loading character data.
546
+ Avatar resource manager, responsible for downloading, caching, and loading avatar data. Use the singleton instance via `AvatarManager.shared`.
245
547
 
246
548
  ```typescript
247
- const manager = new AvatarManager()
549
+ // Get singleton instance
550
+ const manager = AvatarManager.shared
248
551
 
249
- // Load character
552
+ // Load avatar
250
553
  const avatar = await manager.load(
251
- characterId: string,
554
+ id: string,
252
555
  onProgress?: (progress: LoadProgressInfo) => void
253
556
  )
254
557
 
255
558
  // Clear cache
256
- manager.clearCache()
559
+ manager.clearAll()
257
560
  ```
258
561
 
259
562
  ### AvatarView
260
563
 
261
- 3D rendering view (rendering layer), responsible for 3D rendering only. Internally automatically creates and manages `AvatarController`.
564
+ 3D rendering view, responsible for 3D rendering only. Internally automatically creates and manages `AvatarController`.
262
565
 
263
- **⚠️ Important Limitation:** Currently, the SDK only supports one AvatarView instance at a time. If you need to switch characters, you must first call the `dispose()` method to clean up the current AvatarView, then create a new instance.
566
+ ```typescript
567
+ constructor(avatar: Avatar, container: HTMLElement)
568
+ ```
569
+
570
+ **Parameters:**
571
+ - `avatar`: Avatar instance
572
+ - `container`: Canvas container element (required)
573
+ - Canvas automatically uses the full size of the container (width and height)
574
+ - Canvas aspect ratio adapts to container size - set container size to control aspect ratio
575
+ - Canvas will be automatically added to the container
576
+ - SDK automatically handles resize events via ResizeObserver
264
577
 
265
- **Playback Mode Configuration:**
578
+ **Playback Mode:**
579
+ - The playback mode is determined by `drivingServiceMode` in `AvatarSDK.initialize()` configuration
266
580
  - The playback mode is fixed when creating `AvatarView` and persists throughout its lifecycle
267
581
  - Cannot be changed after creation
268
582
 
269
583
  ```typescript
270
- import { AvatarPlaybackMode } from '@spatialwalk/avatarkit'
271
-
272
584
  // Create view (Canvas is automatically added to container)
273
- // Network mode (default)
274
585
  const container = document.getElementById('avatar-container')
275
- const avatarView = new AvatarView(avatar: Avatar, {
276
- container: container,
277
- playbackMode: AvatarPlaybackMode.network // Optional, default is 'network'
278
- })
586
+ const avatarView = new AvatarView(avatar, container)
279
587
 
280
- // External data mode
281
- const avatarView = new AvatarView(avatar: Avatar, {
282
- container: container,
283
- playbackMode: AvatarPlaybackMode.external
284
- })
588
+ // Wait for first frame to render
589
+ avatarView.onFirstRendering = () => {
590
+ // First frame rendered
591
+ }
285
592
 
286
- // Get playback mode
287
- const mode = avatarView.playbackMode // 'network' | 'external'
593
+ // Get or set avatar transform (position and scale)
594
+ // Get current transform
595
+ const currentTransform = avatarView.avatarTransform // { x: number, y: number, scale: number }
288
596
 
289
- // Cleanup resources (must be called before switching characters)
597
+ // Set transform
598
+ avatarView.avatarTransform = { x, y, scale }
599
+ // - x: Horizontal offset in normalized coordinates (-1 to 1, where -1 = left edge, 0 = center, 1 = right edge)
600
+ // - y: Vertical offset in normalized coordinates (-1 to 1, where -1 = bottom edge, 0 = center, 1 = top edge)
601
+ // - scale: Scale factor (1.0 = original size, 2.0 = double size, 0.5 = half size)
602
+
603
+ // Cleanup resources (must be called before switching avatars)
290
604
  avatarView.dispose()
291
605
  ```
292
606
 
293
- **Character Switching Example:**
607
+ **Switching Avatars:**
608
+
609
+ To switch avatars, dispose the old view and create a new one. Do NOT attempt to reuse or reset an existing AvatarView.
610
+ - `AvatarSDK.initialize()` and session token do not need to be called again.
611
+ - The old AvatarView's internal state is fully cleaned up by `dispose()`.
294
612
 
295
613
  ```typescript
296
- // Before switching characters, must clean up old AvatarView first
614
+ // 1. Dispose old avatar
297
615
  if (currentAvatarView) {
298
616
  currentAvatarView.dispose()
299
- currentAvatarView = null
300
617
  }
301
618
 
302
- // Load new character
303
- const newAvatar = await avatarManager.load('new-character-id')
619
+ // 2. Load new avatar (SDK is already initialized, token is still valid)
620
+ const newAvatar = await AvatarManager.shared.load('new-character-id')
304
621
 
305
- // Create new AvatarView (with same or different playback mode)
306
- currentAvatarView = new AvatarView(newAvatar, {
307
- container: container,
308
- playbackMode: AvatarPlaybackMode.network
309
- })
622
+ // 3. Create new AvatarView
623
+ currentAvatarView = new AvatarView(newAvatar, container)
310
624
 
311
- // Network mode: start connection
312
- if (currentAvatarView.playbackMode === AvatarPlaybackMode.network) {
313
- await currentAvatarView.avatarController.start()
314
- }
625
+ // 4. Start connection if SDK mode
626
+ await currentAvatarView.controller.start()
315
627
  ```
316
628
 
317
629
  ### AvatarController
318
630
 
319
- Audio/animation playback controller (playback layer), manages synchronized playback of audio and animation. Automatically composes `NetworkLayer` in network mode.
631
+ Audio/animation playback controller, manages synchronized playback of audio and animation. Automatically handles network communication in SDK mode.
320
632
 
321
633
  **Two Usage Patterns:**
322
634
 
323
- #### Network Mode Methods
635
+ #### SDK Mode Methods
324
636
 
325
637
  ```typescript
326
- // Start WebSocket service
327
- await avatarView.avatarController.start()
328
-
329
- // Send audio data (SDK handles receiving animation data automatically)
330
- avatarView.avatarController.send(audioData: ArrayBuffer, end: boolean)
331
- // audioData: Audio data (ArrayBuffer format, must be 16kHz mono PCM16)
332
- // - Sample rate: 16kHz (16000 Hz) - backend requirement
333
- // - Format: PCM16 (16-bit signed integer, little-endian)
334
- // - Channels: Mono (single channel)
335
- // - Example: 1 second = 16000 samples × 2 bytes = 32000 bytes
336
- // end: false (default) - Normal audio data sending, server will accumulate audio data, automatically returns animation data and starts synchronized playback of animation and audio after accumulating enough data
337
- // end: true - Immediately return animation data, no longer accumulating, used for ending current conversation or scenarios requiring immediate response
638
+ // ⚠️ CRITICAL: Initialize audio context first (MUST be called in user gesture context)
639
+ // This method MUST be called within a user gesture event handler (click, touchstart, etc.)
640
+ // to satisfy browser security policies. Calling it outside a user gesture will fail.
641
+ // All audio operations (start, send, etc.) require prior initialization.
642
+ button.addEventListener('click', async () => {
643
+ // Initialize audio context - MUST be in user gesture context
644
+ await avatarView.controller.initializeAudioContext()
645
+
646
+ // Start service
647
+ await avatarView.controller.start()
648
+
649
+ // Send audio data (must be mono PCM16 format matching configured sample rate)
650
+ const conversationId = avatarView.controller.send(audioData: ArrayBuffer, end: boolean)
651
+ // Returns: conversationId - Conversation ID for this conversation session
652
+ // end: false (default) - Continue sending audio data for current conversation
653
+ // end: true - Mark the end of audio input for current conversation round. The avatar will continue playing remaining animation until finished, then automatically return to idle (notified via onConversationState). After end=true, sending new audio data will interrupt any ongoing playback from the previous conversation round
654
+ })
338
655
 
339
- // Close WebSocket service
340
- avatarView.avatarController.close()
656
+ // Close service
657
+ avatarView.controller.close()
341
658
  ```
342
659
 
343
- #### External Data Mode Methods
660
+ #### Host Mode Methods
344
661
 
345
662
  ```typescript
346
- // Start playback with initial audio and animation data
347
- await avatarView.avatarController.play(
348
- initialAudioChunks?: Array<{ data: Uint8Array, isLast: boolean }>, // Initial audio chunks (16kHz mono PCM16)
349
- initialKeyframes?: any[] // Initial animation keyframes (obtained from your service)
350
- )
663
+ // ⚠️ CRITICAL: Initialize audio context first (MUST be called in user gesture context)
664
+ // This method MUST be called within a user gesture event handler (click, touchstart, etc.)
665
+ // to satisfy browser security policies. Calling it outside a user gesture will fail.
666
+ // All audio operations (yieldAudioData, yieldFramesData, etc.) require prior initialization.
667
+ button.addEventListener('click', async () => {
668
+ // Initialize audio context - MUST be in user gesture context
669
+ await avatarView.controller.initializeAudioContext()
670
+
671
+ // Stream audio chunks (must be mono PCM16 format matching configured sample rate)
672
+ const conversationId = avatarView.controller.yieldAudioData(
673
+ data: Uint8Array, // Audio chunk data (PCM16 format)
674
+ isLast: boolean = false // Whether this is the last chunk
675
+ )
676
+ // Returns: conversationId - Conversation ID for this audio session
677
+
678
+ // Stream animation keyframes (requires conversationId from audio data)
679
+ avatarView.controller.yieldFramesData(
680
+ keyframesDataArray: (Uint8Array | ArrayBuffer)[], // Animation keyframes binary data array
681
+ conversationId: string // Conversation ID (required)
682
+ )
683
+ })
684
+ ```
351
685
 
352
- // Stream additional audio chunks (after play() is called)
353
- avatarView.avatarController.sendAudioChunk(
354
- data: Uint8Array, // Audio chunk data
355
- isLast: boolean = false // Whether this is the last chunk
356
- )
686
+ **⚠️ Important: Conversation ID (conversationId) Management**
357
687
 
358
- // Stream additional animation keyframes (after play() is called)
359
- avatarView.avatarController.sendKeyframes(
360
- keyframes: any[] // Additional animation keyframes (obtained from your service)
361
- )
362
- ```
688
+ **SDK Mode:**
689
+ - `send()` returns a conversationId to distinguish each conversation round
690
+ - `end=true` marks the end of a conversation round
691
+
692
+ **Host Mode:**
693
+ - `yieldAudioData()` returns a conversationId (automatically generates if starting new session)
694
+ - `yieldFramesData()` requires a valid conversationId parameter
695
+ - Animation data with mismatched conversationId will be **discarded**
696
+ - Use `getCurrentConversationId()` to retrieve the current active conversationId
363
697
 
364
698
  #### Common Methods (Both Modes)
365
699
 
366
700
  ```typescript
701
+
702
+ // Pause playback (from playing state)
703
+ avatarView.controller.pause()
704
+
705
+ // Resume playback (from paused state)
706
+ await avatarView.controller.resume()
707
+
367
708
  // Interrupt current playback (stops and clears data)
368
- avatarView.avatarController.interrupt()
709
+ avatarView.controller.interrupt()
369
710
 
370
711
  // Clear all data and resources
371
- avatarView.avatarController.clear()
712
+ avatarView.controller.clear()
713
+
714
+ // Get current conversation ID (for Host mode)
715
+ const conversationId = avatarView.controller.getCurrentConversationId()
716
+ // Returns: Current conversationId for the active audio session, or null if no active session
717
+
718
+ // Volume control (affects only avatar audio player, not system volume)
719
+ avatarView.controller.setVolume(0.5) // Set volume to 50% (0.0 to 1.0)
720
+ const currentVolume = avatarView.controller.getVolume() // Get current volume (0.0 to 1.0)
372
721
 
373
722
  // Set event callbacks
374
- avatarView.avatarController.onConnectionState = (state: ConnectionState) => {} // Network mode only
375
- avatarView.avatarController.onAvatarState = (state: AvatarState) => {}
376
- avatarView.avatarController.onError = (error: Error) => {}
723
+ avatarView.controller.onConnectionState = (state: ConnectionState) => {} // SDK mode only
724
+ avatarView.controller.onConversationState = (state: ConversationState) => {}
725
+ avatarView.controller.onError = (error: AvatarError) => {} // Includes error.code for specific error type
726
+ ```
727
+
728
+ #### Avatar Transform Methods
729
+
730
+ ```typescript
731
+ // Get or set avatar transform (position and scale in canvas)
732
+ // Get current transform
733
+ const currentTransform = avatarView.avatarTransform // { x: number, y: number, scale: number }
734
+
735
+ // Set transform
736
+ avatarView.avatarTransform = { x, y, scale }
737
+ // - x: Horizontal offset in normalized coordinates (-1 to 1, where -1 = left edge, 0 = center, 1 = right edge)
738
+ // - y: Vertical offset in normalized coordinates (-1 to 1, where -1 = bottom edge, 0 = center, 1 = top edge)
739
+ // - scale: Scale factor (1.0 = original size, 2.0 = double size, 0.5 = half size)
740
+ // Example:
741
+ avatarView.avatarTransform = { x: 0, y: 0, scale: 1.0 } // Center, original size
742
+ avatarView.avatarTransform = { x: 0.5, y: 0, scale: 2.0 } // Right half, double size
377
743
  ```
378
744
 
379
745
  **Important Notes:**
380
- - `start()` and `close()` are only available in network mode
381
- - `play()`, `sendAudioChunk()`, and `sendKeyframes()` are only available in external data mode
382
- - `interrupt()` and `clear()` are available in both modes
746
+ - `start()` and `close()` are only available in SDK mode
747
+ - `yieldAudioData()` and `yieldFramesData()` are only available in Host mode
748
+ - `pause()`, `resume()`, `interrupt()`, `clear()`, `getCurrentConversationId()`, `setVolume()`, and `getVolume()` are available in both modes
383
749
  - The playback mode is determined when creating `AvatarView` and cannot be changed
384
750
 
385
751
  ## 🔧 Configuration
@@ -389,40 +755,55 @@ avatarView.avatarController.onError = (error: Error) => {}
389
755
  ```typescript
390
756
  interface Configuration {
391
757
  environment: Environment
758
+ drivingServiceMode?: DrivingServiceMode // Optional, default is 'sdk' (SDK mode)
759
+ logLevel?: LogLevel // Optional, default is 'off' (no logs)
760
+ audioFormat?: AudioFormat // Optional, default is { channelCount: 1, sampleRate: 16000 }
761
+ characterApiBaseUrl?: string // Optional, internal debug config, can be ignored
392
762
  }
393
- ```
394
763
 
395
- **Description:**
396
- - `environment`: Specifies the environment (cn/us/test), SDK will automatically use the corresponding API address and WebSocket address based on the environment
397
- - `sessionToken`: Set separately via `AvatarKit.setSessionToken()`, not in Configuration
398
-
399
- ```typescript
400
- enum Environment {
401
- cn = 'cn', // China region
402
- us = 'us', // US region
403
- test = 'test' // Test environment
764
+ interface AudioFormat {
765
+ readonly channelCount: 1 // Fixed to 1 (mono)
766
+ readonly sampleRate: number // Supported: 8000, 16000, 22050, 24000, 32000, 44100, 48000 Hz, default: 16000
404
767
  }
405
768
  ```
406
769
 
407
- ### AvatarViewOptions
770
+ ### LogLevel
771
+
772
+ Control the verbosity of SDK logs:
408
773
 
409
774
  ```typescript
410
- interface AvatarViewOptions {
411
- playbackMode?: AvatarPlaybackMode // Playback mode, default is 'network'
412
- container?: HTMLElement // Canvas container element
775
+ enum LogLevel {
776
+ off = 'off', // Disable all logs
777
+ error = 'error', // Only error logs
778
+ warning = 'warning', // Warning and error logs
779
+ all = 'all' // All logs (info, warning, error) - default
413
780
  }
414
781
  ```
415
782
 
783
+ **Note:** `LogLevel.off` completely disables all logging, including error logs. Use with caution in production environments.
784
+
416
785
  **Description:**
417
- - `playbackMode`: Specifies the playback mode (`'network'` or `'external'`), default is `'network'`
418
- - `'network'`: SDK handles WebSocket communication, send audio via `send()`
419
- - `'external'`: External components provide audio and animation data, SDK handles synchronized playback
420
- - `container`: Optional container element for Canvas, if not provided, Canvas will be created but not added to DOM
786
+ - `environment`: Specifies the environment (cn/intl), SDK will automatically use the corresponding server addresses based on the environment
787
+ - `drivingServiceMode`: Specifies the driving service mode
788
+ - `DrivingServiceMode.sdk` (default): SDK mode - SDK handles network communication automatically
789
+ - `DrivingServiceMode.host`: Host mode - Host application provides audio and animation data
790
+ - `logLevel`: Controls the verbosity of SDK logs
791
+ - `LogLevel.off` (default): Disable all logs
792
+ - `LogLevel.error`: Only error logs
793
+ - `LogLevel.warning`: Warning and error logs
794
+ - `LogLevel.all`: All logs (info, warning, error)
795
+ - `audioFormat`: Configures audio sample rate and channel count
796
+ - `channelCount`: Fixed to 1 (mono channel)
797
+ - `sampleRate`: Audio sample rate in Hz (default: 16000)
798
+ - Supported values: 8000, 16000, 22050, 24000, 32000, 44100, 48000
799
+ - The configured sample rate will be used for both audio recording and playback
800
+ - `characterApiBaseUrl`: Internal debug config, can be ignored
801
+ - `sessionToken`: **Required for authentication**. Set separately via `AvatarSDK.setSessionToken()`, not in Configuration. See [Authentication](#-authentication) section for details
421
802
 
422
803
  ```typescript
423
- enum AvatarPlaybackMode {
424
- network = 'network', // Network mode: SDK handles WebSocket communication
425
- external = 'external' // External data mode: External provides data, SDK handles playback
804
+ enum Environment {
805
+ cn = 'cn', // China region
806
+ intl = 'intl', // International region
426
807
  }
427
808
  ```
428
809
 
@@ -453,89 +834,42 @@ enum ConnectionState {
453
834
  }
454
835
  ```
455
836
 
456
- ### AvatarState
837
+ ### ConversationState
457
838
 
458
839
  ```typescript
459
- enum AvatarState {
460
- idle = 'idle', // Idle state, showing breathing animation
461
- active = 'active', // Active, waiting for playable content
462
- playing = 'playing' // Playing
840
+ enum ConversationState {
841
+ idle = 'idle', // Idle state (breathing animation)
842
+ playing = 'playing', // Playing state (active conversation)
843
+ pausing = 'pausing' // Pausing state (paused during playback)
463
844
  }
464
845
  ```
465
846
 
466
- ## 🎨 Rendering System
467
-
468
- The SDK supports two rendering backends:
469
-
470
- - **WebGPU** - High-performance rendering for modern browsers
471
- - **WebGL** - Better compatibility traditional rendering
472
-
473
- The rendering system automatically selects the best backend, no manual configuration needed.
474
-
475
- ## 🔍 Debugging and Monitoring
476
-
477
- ### Logging System
478
-
479
- The SDK has a built-in complete logging system, supporting different levels of log output:
480
-
481
- ```typescript
482
- import { logger } from '@spatialwalk/avatarkit'
483
-
484
- // Set log level
485
- logger.setLevel('verbose') // 'basic' | 'verbose'
486
-
487
- // Manual log output
488
- logger.log('Info message')
489
- logger.warn('Warning message')
490
- logger.error('Error message')
491
- ```
492
-
493
- ### Performance Monitoring
494
-
495
- The SDK provides performance monitoring interfaces to monitor rendering performance:
496
-
497
- ```typescript
498
- // Get rendering performance statistics
499
- const stats = avatarView.getPerformanceStats()
847
+ **State Description:**
848
+ - `idle`: Avatar is in idle state (breathing animation), waiting for conversation to start
849
+ - `playing`: Avatar is playing conversation content (including during transition animations)
850
+ - `pausing`: Avatar playback is paused (e.g., when `end=false` and waiting for more audio data)
500
851
 
501
- if (stats) {
502
- console.log(`Render time: ${stats.renderTime.toFixed(2)}ms`)
503
- console.log(`Sort time: ${stats.sortTime.toFixed(2)}ms`)
504
- console.log(`Rendering backend: ${stats.backend}`)
505
-
506
- // Calculate frame rate
507
- const fps = 1000 / stats.renderTime
508
- console.log(`Frame rate: ${fps.toFixed(2)} FPS`)
509
- }
852
+ **Note:** During transition animations, the target state is notified immediately:
853
+ - When transitioning from `idle` to `playing`, the `playing` state is notified immediately
854
+ - When transitioning from `playing` to `idle`, the `idle` state is notified immediately
510
855
 
511
- // Regular performance monitoring
512
- setInterval(() => {
513
- const stats = avatarView.getPerformanceStats()
514
- if (stats) {
515
- // Send to monitoring service or display on UI
516
- console.log('Performance:', stats)
517
- }
518
- }, 1000)
519
- ```
856
+ ## 🎨 Rendering System
520
857
 
521
- **Performance Statistics Description:**
522
- - `renderTime`: Total rendering time (milliseconds), includes sorting and GPU rendering
523
- - `sortTime`: Sorting time (milliseconds), uses Radix Sort algorithm to depth-sort point cloud
524
- - `backend`: Currently used rendering backend (`'webgpu'` | `'webgl'` | `null`)
858
+ The SDK automatically selects the best rendering backend for your browser, no manual configuration needed.
525
859
 
526
860
  ## 🚨 Error Handling
527
861
 
528
- ### SPAvatarError
862
+ ### AvatarError
529
863
 
530
864
  The SDK uses custom error types, providing more detailed error information:
531
865
 
532
866
  ```typescript
533
- import { SPAvatarError } from '@spatialwalk/avatarkit'
867
+ import { AvatarError } from '@spatialwalk/avatarkit'
534
868
 
535
869
  try {
536
- await avatarView.avatarController.start()
870
+ await avatarView.controller.start()
537
871
  } catch (error) {
538
- if (error instanceof SPAvatarError) {
872
+ if (error instanceof AvatarError) {
539
873
  console.error('SDK Error:', error.message, error.code)
540
874
  } else {
541
875
  console.error('Unknown error:', error)
@@ -546,113 +880,91 @@ try {
546
880
  ### Error Callbacks
547
881
 
548
882
  ```typescript
549
- avatarView.avatarController.onError = (error: Error) => {
550
- console.error('AvatarController error:', error)
551
- // Handle error, such as reconnection, user notification, etc.
883
+ import { AvatarError } from '@spatialwalk/avatarkit'
884
+
885
+ avatarView.controller.onError = (error: AvatarError) => {
886
+ console.error('Error:', error.code, error.message)
552
887
  }
553
888
  ```
554
889
 
890
+ `error.code` values (from `ErrorCode` enum):
891
+
892
+ | Code | Description | Trigger |
893
+ |------|-------------|---------|
894
+ | **Authentication & Authorization** | | |
895
+ | `appIDUnrecognized` | App ID not recognized | Reserved |
896
+ | `sessionTokenInvalid` | Token invalid or appId mismatch | WebSocket close code 4010 |
897
+ | `sessionTokenExpired` | Token expired | WebSocket close code 4010 |
898
+ | `insufficientBalance` | Insufficient balance | WebSocket close code 4001 |
899
+ | `concurrentLimitExceeded` | Concurrent connection limit exceeded | WebSocket close code 4003 |
900
+ | **Resource Loading** | | |
901
+ | `avatarIDUnrecognized` | Avatar ID not found | Server error |
902
+ | `failedToFetchAvatarMetadata` | Metadata fetch failed | Network/server error |
903
+ | `failedToDownloadAvatarAssets` | Asset download failed | Network/server error |
904
+ | **Connection** | | |
905
+ | `websocketError` | WebSocket handshake or network error | Connection failure |
906
+ | `websocketClosedAbnormally` | Connection closed abnormally | Close code 1006 |
907
+ | `websocketClosedUnexpected` | Unexpected close code | Unknown close code |
908
+ | `sessionTimeout` | Session timeout | WebSocket close code 4002 |
909
+ | `connectionInProgress` | Connection already in progress | Duplicate `start()` call |
910
+ | **Playback** | | |
911
+ | `networkLayerNotAvailable` | Network layer not available | `send()` in host mode |
912
+ | `playbackStartFailed` | Failed to start playback | Internal error |
913
+ | `playbackInitFailed` | Playback initialization failed | Internal error |
914
+ | `audioOnlyInitFailed` | Audio-only playback init failed | Fallback mode error |
915
+ | `noAudio` | No audio data to play | Empty audio input |
916
+ | `audioContextNotInitialized` | Audio context not initialized | `send()` before `initializeAudioContext()` |
917
+ | `animationPlayerNotInitialized` | Animation player not initialized | Internal error |
918
+ | **Server** | | |
919
+ | `serverError` | Server-side error | Server MESSAGE_SERVER_ERROR |
920
+
555
921
  ## 🔄 Resource Management
556
922
 
557
923
  ### Lifecycle Management
558
924
 
559
- #### Network Mode Lifecycle
925
+ #### SDK Mode Lifecycle
560
926
 
561
927
  ```typescript
562
928
  // Initialize
563
929
  const container = document.getElementById('avatar-container')
564
- const avatarView = new AvatarView(avatar, {
565
- container: container,
566
- playbackMode: AvatarPlaybackMode.network
567
- })
568
- await avatarView.avatarController.start()
930
+ const avatarView = new AvatarView(avatar, container)
931
+ await avatarView.controller.start()
569
932
 
570
933
  // Use
571
- avatarView.avatarController.send(audioData, false)
934
+ avatarView.controller.send(audioData, false)
572
935
 
573
- // Cleanup
574
- avatarView.avatarController.close()
575
- avatarView.dispose() // Automatically cleans up all resources
936
+ // Cleanup - dispose() automatically cleans up all resources including connections
937
+ avatarView.dispose()
576
938
  ```
577
939
 
578
- #### External Data Mode Lifecycle
940
+ #### Host Mode Lifecycle
579
941
 
580
942
  ```typescript
581
943
  // Initialize
582
944
  const container = document.getElementById('avatar-container')
583
- const avatarView = new AvatarView(avatar, {
584
- container: container,
585
- playbackMode: AvatarPlaybackMode.external
586
- })
945
+ const avatarView = new AvatarView(avatar, container)
587
946
 
588
947
  // Use
589
- const initialAudioChunks = [{ data: audioData1, isLast: false }]
590
- await avatarView.avatarController.play(initialAudioChunks, initialKeyframes)
591
- avatarView.avatarController.sendAudioChunk(audioChunk, false)
592
- avatarView.avatarController.sendKeyframes(keyframes)
948
+ const conversationId = avatarView.controller.yieldAudioData(audioChunk, false)
949
+ avatarView.controller.yieldFramesData(keyframesDataArray, conversationId)
593
950
 
594
- // Cleanup
595
- avatarView.avatarController.clear() // Clear all data and resources
596
- avatarView.dispose() // Automatically cleans up all resources
951
+ // Cleanup - dispose() automatically cleans up all resources including playback data
952
+ avatarView.dispose()
597
953
  ```
598
954
 
599
- **⚠️ Important Notes:**
600
- - SDK currently only supports one AvatarView instance at a time
601
- - When switching characters, must first call `dispose()` to clean up old AvatarView, then create new instance
602
- - Not properly cleaning up may cause resource leaks and rendering errors
603
- - In network mode, call `close()` before `dispose()` to properly close WebSocket connections
604
- - In external data mode, call `clear()` before `dispose()` to clear all playback data
955
+ **⚠️ Important Notes:**
956
+ - `dispose()` automatically cleans up all resources, including:
957
+ - Network connections (SDK mode)
958
+ - Playback data and animation resources (both modes)
959
+ - Render system and canvas elements
960
+ - All event listeners and callbacks
961
+ - Not properly calling `dispose()` may cause resource leaks and rendering errors
962
+ - If you need to manually close connections or clear playback data before disposing, you can call `avatarView.controller.close()` (SDK mode) or `avatarView.controller.clear()` (both modes) first, but it's not required as `dispose()` handles this automatically
605
963
 
606
964
  ### Memory Optimization
607
965
 
608
- - SDK automatically manages WASM memory allocation
609
- - Supports dynamic loading/unloading of character and animation resources
610
- - Provides memory usage monitoring interface
611
-
612
- ### Audio Data Sending
613
-
614
- #### Network Mode
615
-
616
- The `send()` method receives audio data in `ArrayBuffer` format:
617
-
618
- **Audio Format Requirements:**
619
- - **Sample Rate**: 16kHz (16000 Hz) - **Backend requirement, must be exactly 16kHz**
620
- - **Format**: PCM16 (16-bit signed integer, little-endian)
621
- - **Channels**: Mono (single channel)
622
- - **Data Size**: Each sample is 2 bytes, so 1 second of audio = 16000 samples × 2 bytes = 32000 bytes
623
-
624
- **Usage:**
625
- - `audioData`: Audio data (ArrayBuffer format, must be 16kHz mono PCM16)
626
- - `end=false` (default) - Normal audio data sending, server will accumulate audio data, automatically returns animation data and starts synchronized playback of animation and audio after accumulating enough data
627
- - `end=true` - Immediately return animation data, no longer accumulating, used for ending current conversation or scenarios requiring immediate response
628
- - **Important**: No need to wait for `end=true` to start playing, it will automatically start playing after accumulating enough audio data
629
-
630
- #### External Data Mode
631
-
632
- The `play()` method starts playback with initial data, then use `sendAudioChunk()` to stream additional audio:
633
-
634
- **Audio Format Requirements:**
635
- - Same as network mode: 16kHz mono PCM16 format
636
- - Audio data should be provided as `Uint8Array` in chunks with `isLast` flag
637
-
638
- **Usage:**
639
- ```typescript
640
- // Start playback with initial audio and animation data
641
- // Note: Audio and animation data should be obtained from your backend service
642
- const initialAudioChunks = [
643
- { data: audioData1, isLast: false },
644
- { data: audioData2, isLast: false }
645
- ]
646
- await avatarController.play(initialAudioChunks, initialKeyframes)
647
-
648
- // Stream additional audio chunks
649
- avatarController.sendAudioChunk(audioChunk, isLast)
650
- ```
651
-
652
- **Resampling (Both Modes):**
653
- - If your audio source is at a different sample rate (e.g., 24kHz, 48kHz), you **must** resample it to 16kHz before sending
654
- - For high-quality resampling, use Web Audio API's `OfflineAudioContext` with anti-aliasing filtering
655
- - See example projects (`vanilla`, `react`, `vue`) for complete resampling implementation
966
+ - SDK automatically manages memory allocation
967
+ - Supports dynamic loading/unloading of avatar and animation resources
656
968
 
657
969
  ## 🌐 Browser Compatibility
658
970
 
@@ -672,6 +984,5 @@ Issues and Pull Requests are welcome!
672
984
  ## 📞 Support
673
985
 
674
986
  For questions, please contact:
675
- - Email: support@spavatar.com
676
- - Documentation: https://docs.spavatar.com
677
- - GitHub: https://github.com/spavatar/sdk
987
+ - Email: code@spatialwalk.net
988
+ - Documentation: https://docs.spatialreal.ai