@spatialwalk/avatarkit 1.0.0-beta.7 → 1.0.0-beta.70

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (101) hide show
  1. package/CHANGELOG.md +595 -10
  2. package/README.md +475 -312
  3. package/dist/StreamingAudioPlayer-Bi2685bX.js +633 -0
  4. package/dist/animation/AnimationWebSocketClient.d.ts +18 -7
  5. package/dist/animation/utils/eventEmitter.d.ts +0 -1
  6. package/dist/animation/utils/flameConverter.d.ts +0 -1
  7. package/dist/audio/AnimationPlayer.d.ts +19 -1
  8. package/dist/audio/StreamingAudioPlayer.d.ts +41 -9
  9. package/dist/avatar_core_wasm-Dv943JJl.js +2696 -0
  10. package/dist/{avatar_core_wasm.wasm → avatar_core_wasm-e68766db.wasm} +0 -0
  11. package/dist/config/app-config.d.ts +3 -4
  12. package/dist/config/constants.d.ts +10 -18
  13. package/dist/config/sdk-config-loader.d.ts +4 -10
  14. package/dist/core/Avatar.d.ts +2 -14
  15. package/dist/core/AvatarController.d.ts +95 -85
  16. package/dist/core/AvatarDownloader.d.ts +7 -92
  17. package/dist/core/AvatarManager.d.ts +22 -12
  18. package/dist/core/AvatarSDK.d.ts +35 -0
  19. package/dist/core/AvatarView.d.ts +55 -140
  20. package/dist/core/NetworkLayer.d.ts +7 -59
  21. package/dist/generated/common/v1/models.d.ts +36 -0
  22. package/dist/generated/driveningress/v1/driveningress.d.ts +0 -1
  23. package/dist/generated/driveningress/v2/driveningress.d.ts +82 -1
  24. package/dist/generated/google/protobuf/struct.d.ts +0 -1
  25. package/dist/generated/google/protobuf/timestamp.d.ts +0 -1
  26. package/dist/index-CvW_c7G-.js +16434 -0
  27. package/dist/index.d.ts +2 -4
  28. package/dist/index.js +17 -18
  29. package/dist/renderer/RenderSystem.d.ts +9 -79
  30. package/dist/renderer/covariance.d.ts +3 -11
  31. package/dist/renderer/renderer.d.ts +6 -2
  32. package/dist/renderer/sortSplats.d.ts +3 -10
  33. package/dist/renderer/webgl/reorderData.d.ts +4 -11
  34. package/dist/renderer/webgl/webglRenderer.d.ts +34 -4
  35. package/dist/renderer/webgpu/webgpuRenderer.d.ts +30 -5
  36. package/dist/types/character-settings.d.ts +1 -1
  37. package/dist/types/character.d.ts +3 -15
  38. package/dist/types/index.d.ts +123 -43
  39. package/dist/utils/animation-interpolation.d.ts +4 -15
  40. package/dist/utils/client-id.d.ts +6 -0
  41. package/dist/utils/conversationId.d.ts +10 -0
  42. package/dist/utils/error-utils.d.ts +0 -1
  43. package/dist/utils/id-manager.d.ts +34 -0
  44. package/dist/utils/logger.d.ts +2 -11
  45. package/dist/utils/posthog-tracker.d.ts +8 -0
  46. package/dist/utils/pwa-cache-manager.d.ts +17 -0
  47. package/dist/utils/usage-tracker.d.ts +6 -0
  48. package/dist/vanilla/vite.config.d.ts +2 -0
  49. package/dist/vite.d.ts +19 -0
  50. package/dist/wasm/avatarCoreAdapter.d.ts +15 -126
  51. package/dist/wasm/avatarCoreMemory.d.ts +5 -2
  52. package/package.json +19 -8
  53. package/vite.d.ts +20 -0
  54. package/vite.js +126 -0
  55. package/dist/StreamingAudioPlayer-D7s8q5h0.js +0 -319
  56. package/dist/StreamingAudioPlayer-D7s8q5h0.js.map +0 -1
  57. package/dist/animation/AnimationWebSocketClient.d.ts.map +0 -1
  58. package/dist/animation/utils/eventEmitter.d.ts.map +0 -1
  59. package/dist/animation/utils/flameConverter.d.ts.map +0 -1
  60. package/dist/audio/AnimationPlayer.d.ts.map +0 -1
  61. package/dist/audio/StreamingAudioPlayer.d.ts.map +0 -1
  62. package/dist/avatar_core_wasm-D4eEi7Eh.js +0 -1666
  63. package/dist/avatar_core_wasm-D4eEi7Eh.js.map +0 -1
  64. package/dist/config/app-config.d.ts.map +0 -1
  65. package/dist/config/constants.d.ts.map +0 -1
  66. package/dist/config/sdk-config-loader.d.ts.map +0 -1
  67. package/dist/core/Avatar.d.ts.map +0 -1
  68. package/dist/core/AvatarController.d.ts.map +0 -1
  69. package/dist/core/AvatarDownloader.d.ts.map +0 -1
  70. package/dist/core/AvatarKit.d.ts +0 -66
  71. package/dist/core/AvatarKit.d.ts.map +0 -1
  72. package/dist/core/AvatarManager.d.ts.map +0 -1
  73. package/dist/core/AvatarView.d.ts.map +0 -1
  74. package/dist/core/NetworkLayer.d.ts.map +0 -1
  75. package/dist/generated/driveningress/v1/driveningress.d.ts.map +0 -1
  76. package/dist/generated/driveningress/v2/driveningress.d.ts.map +0 -1
  77. package/dist/generated/google/protobuf/struct.d.ts.map +0 -1
  78. package/dist/generated/google/protobuf/timestamp.d.ts.map +0 -1
  79. package/dist/index-CpSvWi6A.js +0 -6026
  80. package/dist/index-CpSvWi6A.js.map +0 -1
  81. package/dist/index.d.ts.map +0 -1
  82. package/dist/index.js.map +0 -1
  83. package/dist/renderer/RenderSystem.d.ts.map +0 -1
  84. package/dist/renderer/covariance.d.ts.map +0 -1
  85. package/dist/renderer/renderer.d.ts.map +0 -1
  86. package/dist/renderer/sortSplats.d.ts.map +0 -1
  87. package/dist/renderer/webgl/reorderData.d.ts.map +0 -1
  88. package/dist/renderer/webgl/webglRenderer.d.ts.map +0 -1
  89. package/dist/renderer/webgpu/webgpuRenderer.d.ts.map +0 -1
  90. package/dist/types/character-settings.d.ts.map +0 -1
  91. package/dist/types/character.d.ts.map +0 -1
  92. package/dist/types/index.d.ts.map +0 -1
  93. package/dist/utils/animation-interpolation.d.ts.map +0 -1
  94. package/dist/utils/cls-tracker.d.ts +0 -17
  95. package/dist/utils/cls-tracker.d.ts.map +0 -1
  96. package/dist/utils/error-utils.d.ts.map +0 -1
  97. package/dist/utils/logger.d.ts.map +0 -1
  98. package/dist/utils/reqId.d.ts +0 -20
  99. package/dist/utils/reqId.d.ts.map +0 -1
  100. package/dist/wasm/avatarCoreAdapter.d.ts.map +0 -1
  101. package/dist/wasm/avatarCoreMemory.d.ts.map +0 -1
package/README.md CHANGED
@@ -1,4 +1,4 @@
1
- # SPAvatarKit SDK
1
+ # AvatarKit SDK
2
2
 
3
3
  Real-time virtual avatar rendering SDK based on 3D Gaussian Splatting, supporting audio-driven animation rendering and high-quality 3D rendering.
4
4
 
@@ -6,6 +6,7 @@ Real-time virtual avatar rendering SDK based on 3D Gaussian Splatting, supportin
6
6
 
7
7
  - **3D Gaussian Splatting Rendering** - Based on the latest point cloud rendering technology, providing high-quality 3D virtual avatars
8
8
  - **Audio-Driven Real-Time Animation Rendering** - Users provide audio data, SDK handles receiving animation data and rendering
9
+ - **Multi-Avatar Support** - Support multiple avatar instances simultaneously, each with independent state and rendering
9
10
  - **WebGPU/WebGL Dual Rendering Backend** - Automatically selects the best rendering backend for compatibility
10
11
  - **WASM High-Performance Computing** - Uses C++ compiled WebAssembly modules for geometric calculations
11
12
  - **TypeScript Support** - Complete type definitions and IntelliSense
@@ -17,90 +18,179 @@ Real-time virtual avatar rendering SDK based on 3D Gaussian Splatting, supportin
17
18
  npm install @spatialwalk/avatarkit
18
19
  ```
19
20
 
21
+ ## 🔧 Vite 配置(推荐)
22
+
23
+ 如果你使用 Vite 作为构建工具,强烈推荐使用我们的 Vite 插件来自动处理 WASM 文件配置。插件会自动处理所有必要的配置,让你无需手动设置。
24
+
25
+ ### 使用插件
26
+
27
+ 在 `vite.config.ts` 中添加插件:
28
+
29
+ ```typescript
30
+ import { defineConfig } from 'vite'
31
+ import { avatarkitVitePlugin } from '@spatialwalk/avatarkit/vite'
32
+
33
+ export default defineConfig({
34
+ plugins: [
35
+ avatarkitVitePlugin(), // 添加这一行即可
36
+ ],
37
+ })
38
+ ```
39
+
40
+ ### 插件功能
41
+
42
+ 插件会自动处理:
43
+
44
+ - ✅ **开发服务器**:自动设置 WASM 文件的正确 MIME 类型 (`application/wasm`)
45
+ - ✅ **构建时**:自动复制 WASM 文件到 `dist/assets/` 目录
46
+ - 智能检测:从 JS glue 文件中提取引用的 WASM 文件名(包括 hash)
47
+ - 自动匹配:确保复制的 WASM 文件与 JS glue 文件中的引用匹配
48
+ - 支持 hash:正确处理带 hash 的 WASM 文件(如 `avatar_core_wasm-{hash}.wasm`)
49
+ - ✅ **WASM JS Glue**:自动复制 WASM JS glue 文件到 `dist/assets/` 目录
50
+ - ✅ **Cloudflare Pages**:自动生成 `_headers` 文件,确保 WASM 文件使用正确的 MIME 类型
51
+ - ✅ **Vite 配置**:自动配置 `optimizeDeps`、`assetsInclude`、`assetsInlineLimit` 等选项
52
+
53
+ ### 手动配置(不使用插件)
54
+
55
+ 如果你不使用 Vite 插件,需要手动配置以下内容:
56
+
57
+ ```typescript
58
+ // vite.config.ts
59
+ export default defineConfig({
60
+ optimizeDeps: {
61
+ exclude: ['@spatialwalk/avatarkit'],
62
+ },
63
+ assetsInclude: ['**/*.wasm'],
64
+ build: {
65
+ assetsInlineLimit: 0,
66
+ rollupOptions: {
67
+ output: {
68
+ assetFileNames: (assetInfo) => {
69
+ if (assetInfo.name?.endsWith('.wasm')) {
70
+ return 'assets/[name][extname]'
71
+ }
72
+ return 'assets/[name]-[hash][extname]'
73
+ },
74
+ },
75
+ },
76
+ },
77
+ // 开发服务器需要手动配置中间件设置 WASM MIME 类型
78
+ configureServer(server) {
79
+ server.middlewares.use((req, res, next) => {
80
+ if (req.url?.endsWith('.wasm')) {
81
+ res.setHeader('Content-Type', 'application/wasm')
82
+ }
83
+ next()
84
+ })
85
+ },
86
+ })
87
+ ```
88
+
20
89
  ## 🎯 Quick Start
21
90
 
91
+ ### ⚠️ Important: Audio Context Initialization
92
+
93
+ **Before using any audio-related features, you MUST initialize the audio context in a user gesture context** (e.g., `click`, `touchstart` event handlers). This is required by browser security policies. Calling `initializeAudioContext()` outside a user gesture will fail.
94
+
22
95
  ### Basic Usage
23
96
 
24
97
  ```typescript
25
98
  import {
26
- AvatarKit,
99
+ AvatarSDK,
27
100
  AvatarManager,
28
101
  AvatarView,
29
102
  Configuration,
30
- Environment
103
+ Environment,
104
+ DrivingServiceMode,
105
+ LogLevel
31
106
  } from '@spatialwalk/avatarkit'
32
107
 
33
108
  // 1. Initialize SDK
109
+
34
110
  const configuration: Configuration = {
35
- environment: Environment.test,
111
+ environment: Environment.cn,
112
+ drivingServiceMode: DrivingServiceMode.sdk, // Optional, 'sdk' is default
113
+ // - DrivingServiceMode.sdk: SDK mode - SDK handles WebSocket communication
114
+ // - DrivingServiceMode.host: Host mode - Host app provides audio and animation data
115
+ logLevel: LogLevel.off, // Optional, 'off' is default
116
+ // - LogLevel.off: Disable all logs
117
+ // - LogLevel.error: Only error logs
118
+ // - LogLevel.warning: Warning and error logs
119
+ // - LogLevel.all: All logs (info, warning, error)
120
+ audioFormat: { // Optional, default is { channelCount: 1, sampleRate: 16000 }
121
+ channelCount: 1, // Fixed to 1 (mono)
122
+ sampleRate: 16000 // Supported: 8000, 16000, 22050, 24000, 32000, 44100, 48000 Hz
123
+ }
124
+ // characterApiBaseUrl: 'https://custom-api.example.com' // Optional, internal debug config, can be ignored
36
125
  }
37
126
 
38
- await AvatarKit.initialize('your-app-id', configuration)
127
+ await AvatarSDK.initialize('your-app-id', configuration)
39
128
 
40
129
  // Set sessionToken (if needed, call separately)
41
- // AvatarKit.setSessionToken('your-session-token')
130
+ // AvatarSDK.setSessionToken('your-session-token')
42
131
 
43
- // 2. Load character
44
- const avatarManager = new AvatarManager()
132
+ // 2. Load avatar
133
+ const avatarManager = AvatarManager.shared
45
134
  const avatar = await avatarManager.load('character-id', (progress) => {
46
135
  console.log(`Loading progress: ${progress.progress}%`)
47
136
  })
48
137
 
49
138
  // 3. Create view (automatically creates Canvas and AvatarController)
50
- // Network mode (default)
139
+ // The playback mode is determined by drivingServiceMode in AvatarSDK configuration
140
+ // - DrivingServiceMode.sdk: SDK mode - SDK handles WebSocket communication
141
+ // - DrivingServiceMode.host: Host mode - Host app provides audio and animation data
51
142
  const container = document.getElementById('avatar-container')
52
- const avatarView = new AvatarView(avatar, {
53
- container: container,
54
- playbackMode: 'network' // Optional, 'network' is default
143
+ const avatarView = new AvatarView(avatar, container)
144
+
145
+ // 4. ⚠️ CRITICAL: Initialize audio context (MUST be called in user gesture context)
146
+ // This method MUST be called within a user gesture event handler (click, touchstart, etc.)
147
+ // to satisfy browser security policies. Calling it outside a user gesture will fail.
148
+ button.addEventListener('click', async () => {
149
+ // Initialize audio context - MUST be in user gesture context
150
+ await avatarView.controller.initializeAudioContext()
151
+
152
+ // 5. Start real-time communication (SDK mode only)
153
+ await avatarView.controller.start()
154
+
155
+ // 6. Send audio data (SDK mode, must be mono PCM16 format matching configured sample rate)
156
+ // audioData: ArrayBuffer or Uint8Array containing PCM16 audio samples
157
+ // - PCM files: Can be directly read as ArrayBuffer
158
+ // - WAV files: Extract PCM data from WAV format (may require resampling)
159
+ // - MP3 files: Decode first (e.g., using AudioContext.decodeAudioData()), then convert to PCM16
160
+ const audioData = new ArrayBuffer(1024) // Placeholder: Replace with actual PCM16 audio data
161
+ avatarView.controller.send(audioData, false) // Send audio data
162
+ avatarView.controller.send(audioData, true) // end=true marks the end of current conversation round
55
163
  })
56
-
57
- // 4. Start real-time communication (network mode only)
58
- await avatarView.avatarController.start()
59
-
60
- // 5. Send audio data (network mode)
61
- // ⚠️ Important: Audio must be 16kHz mono PCM16 format
62
- // If audio is Uint8Array, you can use slice().buffer to convert to ArrayBuffer
63
- const audioUint8 = new Uint8Array(1024) // Example: 16kHz PCM16 audio data (512 samples = 1024 bytes)
64
- const audioData = audioUint8.slice().buffer // Simplified conversion, works for ArrayBuffer and SharedArrayBuffer
65
- avatarView.avatarController.send(audioData, false) // Send audio data, will automatically start playing after accumulating enough data
66
- avatarView.avatarController.send(audioData, true) // end=true means immediately return animation data, no longer accumulating
67
164
  ```
68
165
 
69
- ### External Data Mode Example
166
+ ### Host Mode Example
70
167
 
71
168
  ```typescript
72
- import { AvatarPlaybackMode } from '@spatialwalk/avatarkit'
73
169
 
74
- // 1-3. Same as network mode (initialize SDK, load character)
170
+ // 1-3. Same as SDK mode (initialize SDK, load avatar)
75
171
 
76
- // 3. Create view with external data mode
172
+ // 3. Create view with Host mode
77
173
  const container = document.getElementById('avatar-container')
78
- const avatarView = new AvatarView(avatar, {
79
- container: container,
80
- playbackMode: AvatarPlaybackMode.external
81
- })
82
-
83
- // 4. Start playback with initial data (obtained from your service)
84
- // Note: Audio and animation data should be obtained from your backend service
85
- const initialAudioChunks = [{ data: audioData1, isLast: false }, { data: audioData2, isLast: false }]
86
- const initialKeyframes = animationData1 // Animation keyframes from your service
87
-
88
- await avatarView.avatarController.play(initialAudioChunks, initialKeyframes)
89
-
90
- // 5. Stream additional data as needed
91
- avatarView.avatarController.sendAudioChunk(audioData3, false)
92
- avatarView.avatarController.sendKeyframes(animationData2)
174
+ const avatarView = new AvatarView(avatar, container)
175
+
176
+ // 4. ⚠️ CRITICAL: Initialize audio context (MUST be called in user gesture context)
177
+ // This method MUST be called within a user gesture event handler (click, touchstart, etc.)
178
+ // to satisfy browser security policies. Calling it outside a user gesture will fail.
179
+ button.addEventListener('click', async () => {
180
+ // Initialize audio context - MUST be in user gesture context
181
+ await avatarView.controller.initializeAudioContext()
182
+
183
+ // 5. Host Mode Workflow:
184
+ // Send audio data first to get conversationId, then use it to send animation data
185
+ const conversationId = avatarView.controller.yieldAudioData(audioData, false)
186
+ avatarView.controller.yieldFramesData(animationDataArray, conversationId) // animationDataArray: (Uint8Array | ArrayBuffer)[]
93
187
  ```
94
188
 
95
189
  ### Complete Examples
96
190
 
97
- Check the example code in the GitHub repository for complete usage flows for both modes.
98
-
99
- **Example Project:** [Avatarkit-web-demo](https://github.com/spatialwalk/Avatarkit-web-demo)
100
-
101
- This repository contains complete examples for Vanilla JS, Vue 3, and React, demonstrating:
102
- - Network mode: Real-time audio input with automatic animation data reception
103
- - External data mode: Custom data sources with manual audio/animation data management
191
+ This SDK supports two usage modes:
192
+ - SDK mode: Real-time audio input with automatic animation data reception
193
+ - Host mode: Custom data sources with manual audio/animation data management
104
194
 
105
195
  ## 🏗️ Architecture Overview
106
196
 
@@ -110,47 +200,60 @@ The SDK uses a three-layer architecture for clear separation of concerns:
110
200
 
111
201
  1. **Rendering Layer (AvatarView)** - Responsible for 3D rendering only
112
202
  2. **Playback Layer (AvatarController)** - Manages audio/animation synchronization and playback
113
- 3. **Network Layer (NetworkLayer)** - Handles WebSocket communication (only in network mode)
203
+ 3. **Network Layer** - Handles WebSocket communication (only in SDK mode, internal implementation)
114
204
 
115
205
  ### Core Components
116
206
 
117
- - **AvatarKit** - SDK initialization and management
118
- - **AvatarManager** - Character resource loading and management
207
+ - **AvatarSDK** - SDK initialization and management
208
+ - **AvatarManager** - Avatar resource loading and management
119
209
  - **AvatarView** - 3D rendering view (rendering layer)
120
210
  - **AvatarController** - Audio/animation playback controller (playback layer)
121
- - **NetworkLayer** - WebSocket communication (network layer, automatically composed in network mode)
122
- - **AvatarCoreAdapter** - WASM module adapter
123
211
 
124
212
  ### Playback Modes
125
213
 
126
- The SDK supports two playback modes, configured when creating `AvatarView`:
214
+ The SDK supports two playback modes, configured in `AvatarSDK.initialize()`:
127
215
 
128
- #### 1. Network Mode (Default)
216
+ #### 1. SDK Mode (Default)
217
+ - Configured via `drivingServiceMode: DrivingServiceMode.sdk` in `AvatarSDK.initialize()`
129
218
  - SDK handles WebSocket communication automatically
130
219
  - Send audio data via `AvatarController.send()`
131
220
  - SDK receives animation data from backend and synchronizes playback
132
221
  - Best for: Real-time audio input scenarios
133
222
 
134
- #### 2. External Data Mode
135
- - External components manage their own network/data fetching
136
- - External components provide both audio and animation data
223
+ #### 2. Host Mode
224
+ - Configured via `drivingServiceMode: DrivingServiceMode.host` in `AvatarSDK.initialize()`
225
+ - Host application manages its own network/data fetching
226
+ - Host application provides both audio and animation data
137
227
  - SDK only handles synchronized playback
138
228
  - Best for: Custom data sources, pre-recorded content, or custom network implementations
139
229
 
230
+ **Note:** The playback mode is determined by `drivingServiceMode` in `AvatarSDK.initialize()` configuration.
231
+
232
+ ### Fallback Mechanism
233
+
234
+ The SDK includes a fallback mechanism to ensure audio playback continues even when animation data is unavailable:
235
+
236
+ - **SDK Mode Connection Failure**: If WebSocket connection fails to establish within 15 seconds, the SDK automatically enters fallback mode. In this mode, audio data can still be sent and will play normally, even though no animation data will be received from the server. This ensures that audio playback is not interrupted even when the service connection fails.
237
+ - **SDK Mode Server Error**: If the server returns an error after connection is established, the SDK automatically enters audio-only mode for that session and continues playing audio independently.
238
+ - **Host Mode**: If empty animation data is provided (empty array or undefined), the SDK automatically enters audio-only mode.
239
+ - Once in audio-only mode, any subsequent animation data for that session will be ignored, and only audio will continue playing.
240
+ - The fallback mode is interruptible, just like normal playback mode.
241
+ - Connection state callbacks (`onConnectionState`) will notify you when connection fails or times out, allowing you to handle the fallback state appropriately.
242
+
140
243
  ### Data Flow
141
244
 
142
- #### Network Mode Flow
245
+ #### SDK Mode Flow
143
246
 
144
247
  ```
145
248
  User audio input (16kHz mono PCM16)
146
249
 
147
250
  AvatarController.send()
148
251
 
149
- NetworkLayer → WebSocket → Backend processing
252
+ WebSocket → Backend processing
150
253
 
151
254
  Backend returns animation data (FLAME keyframes)
152
255
 
153
- NetworkLayer → AvatarController → AnimationPlayer
256
+ AvatarController → AnimationPlayer
154
257
 
155
258
  FLAME parameters → AvatarCore.computeFrameFlatFromParams() → Splat data
156
259
 
@@ -159,15 +262,14 @@ AvatarController (playback loop) → AvatarView.renderRealtimeFrame()
159
262
  RenderSystem → WebGPU/WebGL → Canvas rendering
160
263
  ```
161
264
 
162
- #### External Data Mode Flow
265
+ #### Host Mode Flow
163
266
 
164
267
  ```
165
268
  External data source (audio + animation)
166
269
 
167
- AvatarController.play(initialAudio, initialKeyframes) // Start playback
270
+ AvatarController.yieldAudioData(audioChunk) // Returns conversationId
168
271
 
169
- AvatarController.sendAudioChunk() // Stream additional audio
170
- AvatarController.sendKeyframes() // Stream additional animation
272
+ AvatarController.yieldFramesData(keyframesDataArray, conversationId) // keyframesDataArray: (Uint8Array | ArrayBuffer)[] - each element is a protobuf encoded Message
171
273
 
172
274
  AvatarController → AnimationPlayer (synchronized playback)
173
275
 
@@ -178,205 +280,348 @@ AvatarController (playback loop) → AvatarView.renderRealtimeFrame()
178
280
  RenderSystem → WebGPU/WebGL → Canvas rendering
179
281
  ```
180
282
 
181
- **Note:**
182
- - In network mode, users provide audio data, SDK handles network communication and animation data reception
183
- - In external data mode, users provide both audio and animation data, SDK handles synchronized playback only
184
-
185
283
  ### Audio Format Requirements
186
284
 
187
- **⚠️ Important:** The SDK requires audio data to be in **16kHz mono PCM16** format:
285
+ **⚠️ Important:** The SDK requires audio data to be in **mono PCM16** format:
188
286
 
189
- - **Sample Rate**: 16kHz (16000 Hz) - This is a backend requirement
190
- - **Channels**: Mono (single channel)
287
+ - **Sample Rate**: Configurable via `audioFormat.sampleRate` in SDK initialization (default: 16000 Hz)
288
+ - Supported sample rates: 8000, 16000, 22050, 24000, 32000, 44100, 48000 Hz
289
+ - The configured sample rate will be used for both audio recording and playback
290
+ - **Channels**: Mono (single channel) - Fixed to 1 channel
191
291
  - **Format**: PCM16 (16-bit signed integer, little-endian)
192
292
  - **Byte Order**: Little-endian
193
293
 
194
294
  **Audio Data Format:**
195
- - Each sample is 2 bytes (16-bit)
295
+ - Each sample is 2 bytes (16-bit signed integer, little-endian)
196
296
  - Audio data should be provided as `ArrayBuffer` or `Uint8Array`
197
- - For example: 1 second of audio = 16000 samples × 2 bytes = 32000 bytes
297
+ - For example, with 16kHz sample rate: 1 second of audio = 16000 samples × 2 bytes = 32000 bytes
298
+ - For 48kHz sample rate: 1 second of audio = 48000 samples × 2 bytes = 96000 bytes
299
+
300
+ **Audio Data Source:**
301
+ The `audioData` parameter represents raw PCM16 audio samples in the configured sample rate and mono format. Common audio sources include:
302
+ - **PCM files**: Raw PCM16 files can be directly read as `ArrayBuffer` or `Uint8Array` and sent to the SDK (ensure sample rate matches configuration)
303
+ - **WAV files**: WAV files contain PCM16 audio data in their data chunk. After extracting the PCM data from the WAV file format, it can be sent to the SDK (may require resampling if sample rate differs)
304
+ - **MP3 files**: MP3 files need to be decoded first (e.g., using `AudioContext.decodeAudioData()` or a decoder library), then converted from the decoded format to PCM16 before sending to the SDK
305
+ - **Microphone input**: Real-time microphone audio needs to be captured and converted to PCM16 format at the configured sample rate before sending
306
+ - **Other audio sources**: Any audio source must be converted to mono PCM16 format at the configured sample rate before sending
307
+
308
+ **Example: Processing WAV and MP3 Files:**
309
+ ```typescript
310
+ // WAV file processing
311
+ async function processWAVFile(wavFile: File): Promise<ArrayBuffer> {
312
+ const arrayBuffer = await wavFile.arrayBuffer()
313
+ const view = new DataView(arrayBuffer)
314
+
315
+ // WAV format: Skip header (usually 44 bytes for standard WAV)
316
+ // Check RIFF header
317
+ if (view.getUint32(0, true) !== 0x46464952) { // "RIFF"
318
+ throw new Error('Invalid WAV file')
319
+ }
320
+
321
+ // Find "data" chunk (offset may vary)
322
+ let dataOffset = 44 // Standard WAV header size
323
+ // For non-standard WAV files, you may need to search for "data" chunk
324
+ // This is a simplified example - production code should parse chunks properly
325
+
326
+ const pcmData = arrayBuffer.slice(dataOffset)
327
+ return pcmData
328
+ }
329
+
330
+ // MP3 file processing
331
+ async function processMP3File(mp3File: File, targetSampleRate: number): Promise<ArrayBuffer> {
332
+ const arrayBuffer = await mp3File.arrayBuffer()
333
+ const audioContext = new AudioContext({ sampleRate: targetSampleRate })
334
+
335
+ // Decode MP3 to AudioBuffer
336
+ const audioBuffer = await audioContext.decodeAudioData(arrayBuffer.slice(0))
337
+
338
+ // Convert AudioBuffer to PCM16 ArrayBuffer
339
+ const length = audioBuffer.length
340
+ const channels = audioBuffer.numberOfChannels
341
+ const pcm16Buffer = new ArrayBuffer(length * 2)
342
+ const pcm16View = new DataView(pcm16Buffer)
343
+
344
+ // Mix down to mono if stereo
345
+ const sourceData = channels === 1
346
+ ? audioBuffer.getChannelData(0)
347
+ : new Float32Array(length)
348
+
349
+ if (channels > 1) {
350
+ const leftChannel = audioBuffer.getChannelData(0)
351
+ const rightChannel = audioBuffer.getChannelData(1)
352
+ for (let i = 0; i < length; i++) {
353
+ sourceData[i] = (leftChannel[i] + rightChannel[i]) / 2 // Mix to mono
354
+ }
355
+ }
356
+
357
+ // Convert float32 (-1.0 to 1.0) to int16 (-32768 to 32767)
358
+ for (let i = 0; i < length; i++) {
359
+ const sample = Math.max(-1, Math.min(1, sourceData[i])) // Clamp
360
+ const int16Sample = sample < 0 ? sample * 0x8000 : sample * 0x7FFF
361
+ pcm16View.setInt16(i * 2, int16Sample, true) // little-endian
362
+ }
363
+
364
+ audioContext.close()
365
+ return pcm16Buffer
366
+ }
367
+
368
+ // Usage example:
369
+ // const wavPcmData = await processWAVFile(wavFile)
370
+ // avatarView.controller.send(wavPcmData, false)
371
+ //
372
+ // const mp3PcmData = await processMP3File(mp3File, 16000) // 16kHz
373
+ // avatarView.controller.send(mp3PcmData, false)
374
+ ```
198
375
 
199
376
  **Resampling:**
200
- - If your audio source is at a different sample rate (e.g., 24kHz, 48kHz), you must resample it to 16kHz before sending to the SDK
377
+ - If your audio source is at a different sample rate, you must resample it to match the configured sample rate before sending to the SDK
201
378
  - For high-quality resampling, we recommend using Web Audio API's `OfflineAudioContext` with anti-aliasing filtering
202
379
  - See example projects for resampling implementation
203
380
 
381
+ **Configuration Example:**
382
+ ```typescript
383
+ const configuration: Configuration = {
384
+ environment: Environment.cn,
385
+ audioFormat: {
386
+ channelCount: 1, // Fixed to 1 (mono)
387
+ sampleRate: 48000 // Choose from: 8000, 16000, 22050, 24000, 32000, 44100, 48000
388
+ }
389
+ }
390
+ ```
391
+
204
392
  ## 📚 API Reference
205
393
 
206
- ### AvatarKit
394
+ ### AvatarSDK
207
395
 
208
396
  The core management class of the SDK, responsible for initialization and global configuration.
209
397
 
210
398
  ```typescript
211
399
  // Initialize SDK
212
- await AvatarKit.initialize(appId: string, configuration: Configuration)
400
+ await AvatarSDK.initialize(appId: string, configuration: Configuration)
213
401
 
214
402
  // Check initialization status
215
- const isInitialized = AvatarKit.isInitialized
403
+ const isInitialized = AvatarSDK.isInitialized
404
+
405
+ // Get initialized app ID
406
+ const appId = AvatarSDK.appId
407
+
408
+ // Get configuration
409
+ const config = AvatarSDK.configuration
410
+
411
+ // Set sessionToken (if needed, call separately)
412
+ AvatarSDK.setSessionToken('your-session-token')
413
+
414
+ // Set userId (optional, for telemetry)
415
+ AvatarSDK.setUserId('user-id')
416
+
417
+ // Get sessionToken
418
+ const sessionToken = AvatarSDK.sessionToken
419
+
420
+ // Get userId
421
+ const userId = AvatarSDK.userId
422
+
423
+ // Get SDK version
424
+ const version = AvatarSDK.version
216
425
 
217
426
  // Cleanup resources (must be called when no longer in use)
218
- AvatarKit.cleanup()
427
+ AvatarSDK.cleanup()
219
428
  ```
220
429
 
221
430
  ### AvatarManager
222
431
 
223
- Character resource manager, responsible for downloading, caching, and loading character data.
432
+ Avatar resource manager, responsible for downloading, caching, and loading avatar data. Use the singleton instance via `AvatarManager.shared`.
224
433
 
225
434
  ```typescript
226
- const manager = new AvatarManager()
435
+ // Get singleton instance
436
+ const manager = AvatarManager.shared
227
437
 
228
- // Load character
438
+ // Load avatar
229
439
  const avatar = await manager.load(
230
440
  characterId: string,
231
441
  onProgress?: (progress: LoadProgressInfo) => void
232
442
  )
233
443
 
234
444
  // Clear cache
235
- manager.clearCache()
445
+ manager.clearAll()
236
446
  ```
237
447
 
238
448
  ### AvatarView
239
449
 
240
450
  3D rendering view (rendering layer), responsible for 3D rendering only. Internally automatically creates and manages `AvatarController`.
241
451
 
242
- **⚠️ Important Limitation:** Currently, the SDK only supports one AvatarView instance at a time. If you need to switch characters, you must first call the `dispose()` method to clean up the current AvatarView, then create a new instance.
452
+ ```typescript
453
+ constructor(avatar: Avatar, container: HTMLElement)
454
+ ```
243
455
 
244
- **Playback Mode Configuration:**
456
+ **Parameters:**
457
+ - `avatar`: Avatar instance
458
+ - `container`: Canvas container element (required)
459
+ - Canvas automatically uses the full size of the container (width and height)
460
+ - Canvas aspect ratio adapts to container size - set container size to control aspect ratio
461
+ - Canvas will be automatically added to the container
462
+ - SDK automatically handles resize events via ResizeObserver
463
+
464
+ **Playback Mode:**
465
+ - The playback mode is determined by `drivingServiceMode` in `AvatarSDK.initialize()` configuration
245
466
  - The playback mode is fixed when creating `AvatarView` and persists throughout its lifecycle
246
467
  - Cannot be changed after creation
247
468
 
248
469
  ```typescript
249
- import { AvatarPlaybackMode } from '@spatialwalk/avatarkit'
250
-
251
470
  // Create view (Canvas is automatically added to container)
252
- // Network mode (default)
253
471
  const container = document.getElementById('avatar-container')
254
- const avatarView = new AvatarView(avatar: Avatar, {
255
- container: container,
256
- playbackMode: AvatarPlaybackMode.network // Optional, default is 'network'
257
- })
258
-
259
- // External data mode
260
- const avatarView = new AvatarView(avatar: Avatar, {
261
- container: container,
262
- playbackMode: AvatarPlaybackMode.external
263
- })
472
+ const avatarView = new AvatarView(avatar, container)
264
473
 
265
- // Get Canvas element
266
- const canvas = avatarView.getCanvas()
474
+ // Wait for first frame to render
475
+ avatarView.onFirstRendering = () => {
476
+ // First frame rendered
477
+ }
267
478
 
268
- // Get playback mode
269
- const mode = avatarView.playbackMode // 'network' | 'external'
479
+ // Get or set avatar transform (position and scale)
480
+ // Get current transform
481
+ const currentTransform = avatarView.transform // { x: number, y: number, scale: number }
270
482
 
271
- // Update camera configuration
272
- avatarView.updateCameraConfig(cameraConfig: CameraConfig)
483
+ // Set transform
484
+ avatarView.transform = { x, y, scale }
485
+ // - x: Horizontal offset in normalized coordinates (-1 to 1, where -1 = left edge, 0 = center, 1 = right edge)
486
+ // - y: Vertical offset in normalized coordinates (-1 to 1, where -1 = bottom edge, 0 = center, 1 = top edge)
487
+ // - scale: Scale factor (1.0 = original size, 2.0 = double size, 0.5 = half size)
273
488
 
274
- // Cleanup resources (must be called before switching characters)
489
+ // Cleanup resources (must be called before switching avatars)
275
490
  avatarView.dispose()
276
491
  ```
277
492
 
278
- **Character Switching Example:**
493
+ **Avatar Switching Example:**
279
494
 
280
495
  ```typescript
281
- // Before switching characters, must clean up old AvatarView first
496
+ // To switch avatars, simply dispose the old view and create a new one
282
497
  if (currentAvatarView) {
283
498
  currentAvatarView.dispose()
284
- currentAvatarView = null
285
499
  }
286
500
 
287
- // Load new character
501
+ // Load new avatar
288
502
  const newAvatar = await avatarManager.load('new-character-id')
289
503
 
290
- // Create new AvatarView (with same or different playback mode)
291
- currentAvatarView = new AvatarView(newAvatar, {
292
- container: container,
293
- playbackMode: AvatarPlaybackMode.network
294
- })
504
+ // Create new AvatarView
505
+ currentAvatarView = new AvatarView(newAvatar, container)
295
506
 
296
- // Network mode: start connection
297
- if (currentAvatarView.playbackMode === AvatarPlaybackMode.network) {
298
- await currentAvatarView.avatarController.start()
299
- }
507
+ // SDK mode: start connection (will throw error if not in SDK mode)
508
+ await currentAvatarView.controller.start()
300
509
  ```
301
510
 
302
511
  ### AvatarController
303
512
 
304
- Audio/animation playback controller (playback layer), manages synchronized playback of audio and animation. Automatically composes `NetworkLayer` in network mode.
513
+ Audio/animation playback controller (playback layer), manages synchronized playback of audio and animation. Automatically handles WebSocket communication in SDK mode.
305
514
 
306
515
  **Two Usage Patterns:**
307
516
 
308
- #### Network Mode Methods
517
+ #### SDK Mode Methods
309
518
 
310
519
  ```typescript
311
- // Start WebSocket service
312
- await avatarView.avatarController.start()
313
-
314
- // Send audio data (SDK handles receiving animation data automatically)
315
- avatarView.avatarController.send(audioData: ArrayBuffer, end: boolean)
316
- // audioData: Audio data (ArrayBuffer format, must be 16kHz mono PCM16)
317
- // - Sample rate: 16kHz (16000 Hz) - backend requirement
318
- // - Format: PCM16 (16-bit signed integer, little-endian)
319
- // - Channels: Mono (single channel)
320
- // - Example: 1 second = 16000 samples × 2 bytes = 32000 bytes
321
- // end: false (default) - Normal audio data sending, server will accumulate audio data, automatically returns animation data and starts synchronized playback of animation and audio after accumulating enough data
322
- // end: true - Immediately return animation data, no longer accumulating, used for ending current conversation or scenarios requiring immediate response
520
+ // ⚠️ CRITICAL: Initialize audio context first (MUST be called in user gesture context)
521
+ // This method MUST be called within a user gesture event handler (click, touchstart, etc.)
522
+ // to satisfy browser security policies. Calling it outside a user gesture will fail.
523
+ // All audio operations (start, send, etc.) require prior initialization.
524
+ button.addEventListener('click', async () => {
525
+ // Initialize audio context - MUST be in user gesture context
526
+ await avatarView.controller.initializeAudioContext()
527
+
528
+ // Start WebSocket service
529
+ await avatarView.controller.start()
530
+
531
+ // Send audio data (must be mono PCM16 format matching configured sample rate)
532
+ const conversationId = avatarView.controller.send(audioData: ArrayBuffer, end: boolean)
533
+ // Returns: conversationId - Conversation ID for this conversation session
534
+ // end: false (default) - Continue sending audio data for current conversation
535
+ // end: true - Mark the end of current conversation round. After end=true, sending new audio data will interrupt any ongoing playback from the previous conversation round
536
+ })
323
537
 
324
538
  // Close WebSocket service
325
- avatarView.avatarController.close()
539
+ avatarView.controller.close()
326
540
  ```
327
541
 
328
- #### External Data Mode Methods
542
+ #### Host Mode Methods
329
543
 
330
544
  ```typescript
331
- // Start playback with initial audio and animation data
332
- await avatarView.avatarController.play(
333
- initialAudioChunks?: Array<{ data: Uint8Array, isLast: boolean }>, // Initial audio chunks (16kHz mono PCM16)
334
- initialKeyframes?: any[] // Initial animation keyframes (obtained from your service)
335
- )
545
+ // ⚠️ CRITICAL: Initialize audio context first (MUST be called in user gesture context)
546
+ // This method MUST be called within a user gesture event handler (click, touchstart, etc.)
547
+ // to satisfy browser security policies. Calling it outside a user gesture will fail.
548
+ // All audio operations (yieldAudioData, yieldFramesData, etc.) require prior initialization.
549
+ button.addEventListener('click', async () => {
550
+ // Initialize audio context - MUST be in user gesture context
551
+ await avatarView.controller.initializeAudioContext()
552
+
553
+ // Stream audio chunks (must be mono PCM16 format matching configured sample rate)
554
+ const conversationId = avatarView.controller.yieldAudioData(
555
+ data: Uint8Array, // Audio chunk data (PCM16 format)
556
+ isLast: boolean = false // Whether this is the last chunk
557
+ )
558
+ // Returns: conversationId - Conversation ID for this audio session
559
+
560
+ // Stream animation keyframes (requires conversationId from audio data)
561
+ avatarView.controller.yieldFramesData(
562
+ keyframesDataArray: (Uint8Array | ArrayBuffer)[], // Animation keyframes binary data array (each element is a protobuf encoded Message)
563
+ conversationId: string // Conversation ID (required)
564
+ )
565
+ })
566
+ ```
336
567
 
337
- // Stream additional audio chunks (after play() is called)
338
- avatarView.avatarController.sendAudioChunk(
339
- data: Uint8Array, // Audio chunk data
340
- isLast: boolean = false // Whether this is the last chunk
341
- )
568
+ **⚠️ Important: Conversation ID (conversationId) Management**
342
569
 
343
- // Stream additional animation keyframes (after play() is called)
344
- avatarView.avatarController.sendKeyframes(
345
- keyframes: any[] // Additional animation keyframes (obtained from your service)
346
- )
347
- ```
570
+ **SDK Mode:**
571
+ - `send()` returns a conversationId to distinguish each conversation round
572
+ - `end=true` marks the end of a conversation round
573
+
574
+ **Host Mode:**
575
+ - `yieldAudioData()` returns a conversationId (automatically generates if starting new session)
576
+ - `yieldFramesData()` requires a valid conversationId parameter
577
+ - Animation data with mismatched conversationId will be **discarded**
578
+ - Use `getCurrentConversationId()` to retrieve the current active conversationId
348
579
 
349
580
  #### Common Methods (Both Modes)
350
581
 
351
582
  ```typescript
583
+
352
584
  // Interrupt current playback (stops and clears data)
353
585
  avatarView.avatarController.interrupt()
354
586
 
355
587
  // Clear all data and resources
356
588
  avatarView.avatarController.clear()
357
589
 
358
- // Get connection state (network mode only)
359
- const isConnected = avatarView.avatarController.connected
360
-
361
- // Start service (network mode only)
362
- await avatarView.avatarController.start()
363
-
364
- // Close service (network mode only)
365
- avatarView.avatarController.close()
590
+ // Get current conversation ID (for Host mode)
591
+ const conversationId = avatarView.avatarController.getCurrentConversationId()
592
+ // Returns: Current conversationId for the active audio session, or null if no active session
366
593
 
367
- // Get current avatar state
368
- const state = avatarView.avatarController.state
594
+ // Volume control (affects only avatar audio player, not system volume)
595
+ avatarView.avatarController.setVolume(0.5) // Set volume to 50% (0.0 to 1.0)
596
+ const currentVolume = avatarView.avatarController.getVolume() // Get current volume (0.0 to 1.0)
369
597
 
370
598
  // Set event callbacks
371
- avatarView.avatarController.onConnectionState = (state: ConnectionState) => {} // Network mode only
372
- avatarView.avatarController.onAvatarState = (state: AvatarState) => {}
599
+ avatarView.avatarController.onConnectionState = (state: ConnectionState) => {} // SDK mode only
600
+ avatarView.avatarController.onConversationState = (state: ConversationState) => {}
373
601
  avatarView.avatarController.onError = (error: Error) => {}
374
602
  ```
375
603
 
604
+ #### Avatar Transform Methods
605
+
606
+ ```typescript
607
+ // Get or set avatar transform (position and scale in canvas)
608
+ // Get current transform
609
+ const currentTransform = avatarView.transform // { x: number, y: number, scale: number }
610
+
611
+ // Set transform
612
+ avatarView.transform = { x, y, scale }
613
+ // - x: Horizontal offset in normalized coordinates (-1 to 1, where -1 = left edge, 0 = center, 1 = right edge)
614
+ // - y: Vertical offset in normalized coordinates (-1 to 1, where -1 = bottom edge, 0 = center, 1 = top edge)
615
+ // - scale: Scale factor (1.0 = original size, 2.0 = double size, 0.5 = half size)
616
+ // Example:
617
+ avatarView.transform = { x: 0, y: 0, scale: 1.0 } // Center, original size
618
+ avatarView.transform = { x: 0.5, y: 0, scale: 2.0 } // Right half, double size
619
+ ```
620
+
376
621
  **Important Notes:**
377
- - `start()` and `close()` are only available in network mode
378
- - `play()`, `sendAudioChunk()`, and `sendKeyframes()` are only available in external data mode
379
- - `interrupt()` and `clear()` are available in both modes
622
+ - `start()` and `close()` are only available in SDK mode
623
+ - `yieldAudioData()` and `yieldFramesData()` are only available in Host mode
624
+ - `pause()`, `resume()`, `interrupt()`, `clear()`, `getCurrentConversationId()`, `setVolume()`, and `getVolume()` are available in both modes
380
625
  - The playback mode is determined when creating `AvatarView` and cannot be changed
381
626
 
382
627
  ## 🔧 Configuration
@@ -386,40 +631,55 @@ avatarView.avatarController.onError = (error: Error) => {}
386
631
  ```typescript
387
632
  interface Configuration {
388
633
  environment: Environment
634
+ drivingServiceMode?: DrivingServiceMode // Optional, default is 'sdk' (SDK mode)
635
+ logLevel?: LogLevel // Optional, default is 'off' (no logs)
636
+ audioFormat?: AudioFormat // Optional, default is { channelCount: 1, sampleRate: 16000 }
637
+ characterApiBaseUrl?: string // Optional, internal debug config, can be ignored
389
638
  }
390
- ```
391
-
392
- **Description:**
393
- - `environment`: Specifies the environment (cn/us/test), SDK will automatically use the corresponding API address and WebSocket address based on the environment
394
- - `sessionToken`: Set separately via `AvatarKit.setSessionToken()`, not in Configuration
395
639
 
396
- ```typescript
397
- enum Environment {
398
- cn = 'cn', // China region
399
- us = 'us', // US region
400
- test = 'test' // Test environment
640
+ interface AudioFormat {
641
+ readonly channelCount: 1 // Fixed to 1 (mono)
642
+ readonly sampleRate: number // Supported: 8000, 16000, 22050, 24000, 32000, 44100, 48000 Hz, default: 16000
401
643
  }
402
644
  ```
403
645
 
404
- ### AvatarViewOptions
646
+ ### LogLevel
647
+
648
+ Control the verbosity of SDK logs:
405
649
 
406
650
  ```typescript
407
- interface AvatarViewOptions {
408
- playbackMode?: AvatarPlaybackMode // Playback mode, default is 'network'
409
- container?: HTMLElement // Canvas container element
651
+ enum LogLevel {
652
+ off = 'off', // Disable all logs
653
+ error = 'error', // Only error logs
654
+ warning = 'warning', // Warning and error logs
655
+ all = 'all' // All logs (info, warning, error) - default
410
656
  }
411
657
  ```
412
658
 
659
+ **Note:** `LogLevel.off` completely disables all logging, including error logs. Use with caution in production environments.
660
+
413
661
  **Description:**
414
- - `playbackMode`: Specifies the playback mode (`'network'` or `'external'`), default is `'network'`
415
- - `'network'`: SDK handles WebSocket communication, send audio via `send()`
416
- - `'external'`: External components provide audio and animation data, SDK handles synchronized playback
417
- - `container`: Optional container element for Canvas, if not provided, Canvas will be created but not added to DOM
662
+ - `environment`: Specifies the environment (cn/intl), SDK will automatically use the corresponding API address and WebSocket address based on the environment
663
+ - `drivingServiceMode`: Specifies the driving service mode
664
+ - `DrivingServiceMode.sdk` (default): SDK mode - SDK handles WebSocket communication automatically
665
+ - `DrivingServiceMode.host`: Host mode - Host application provides audio and animation data
666
+ - `logLevel`: Controls the verbosity of SDK logs
667
+ - `LogLevel.off` (default): Disable all logs
668
+ - `LogLevel.error`: Only error logs
669
+ - `LogLevel.warning`: Warning and error logs
670
+ - `LogLevel.all`: All logs (info, warning, error)
671
+ - `audioFormat`: Configures audio sample rate and channel count
672
+ - `channelCount`: Fixed to 1 (mono channel)
673
+ - `sampleRate`: Audio sample rate in Hz (default: 16000)
674
+ - Supported values: 8000, 16000, 22050, 24000, 32000, 44100, 48000
675
+ - The configured sample rate will be used for both audio recording and playback
676
+ - `characterApiBaseUrl`: Internal debug config, can be ignored
677
+ - `sessionToken`: Set separately via `AvatarSDK.setSessionToken()`, not in Configuration
418
678
 
419
679
  ```typescript
420
- enum AvatarPlaybackMode {
421
- network = 'network', // Network mode: SDK handles WebSocket communication
422
- external = 'external' // External data mode: External provides data, SDK handles playback
680
+ enum Environment {
681
+ cn = 'cn', // China region
682
+ intl = 'intl', // International region
423
683
  }
424
684
  ```
425
685
 
@@ -450,16 +710,25 @@ enum ConnectionState {
450
710
  }
451
711
  ```
452
712
 
453
- ### AvatarState
713
+ ### ConversationState
454
714
 
455
715
  ```typescript
456
- enum AvatarState {
457
- idle = 'idle', // Idle state, showing breathing animation
458
- active = 'active', // Active, waiting for playable content
459
- playing = 'playing' // Playing
716
+ enum ConversationState {
717
+ idle = 'idle', // Idle state (breathing animation)
718
+ playing = 'playing', // Playing state (active conversation)
719
+ pausing = 'pausing' // Pausing state (paused during playback)
460
720
  }
461
721
  ```
462
722
 
723
+ **State Description:**
724
+ - `idle`: Avatar is in idle state (breathing animation), waiting for conversation to start
725
+ - `playing`: Avatar is playing conversation content (including during transition animations)
726
+ - `pausing`: Avatar playback is paused (e.g., when `end=false` and waiting for more audio data)
727
+
728
+ **Note:** During transition animations, the target state is notified immediately:
729
+ - When transitioning from `idle` to `playing`, the `playing` state is notified immediately
730
+ - When transitioning from `playing` to `idle`, the `idle` state is notified immediately
731
+
463
732
  ## 🎨 Rendering System
464
733
 
465
734
  The SDK supports two rendering backends:
@@ -469,70 +738,19 @@ The SDK supports two rendering backends:
469
738
 
470
739
  The rendering system automatically selects the best backend, no manual configuration needed.
471
740
 
472
- ## 🔍 Debugging and Monitoring
473
-
474
- ### Logging System
475
-
476
- The SDK has a built-in complete logging system, supporting different levels of log output:
477
-
478
- ```typescript
479
- import { logger } from '@spatialwalk/avatarkit'
480
-
481
- // Set log level
482
- logger.setLevel('verbose') // 'basic' | 'verbose'
483
-
484
- // Manual log output
485
- logger.log('Info message')
486
- logger.warn('Warning message')
487
- logger.error('Error message')
488
- ```
489
-
490
- ### Performance Monitoring
491
-
492
- The SDK provides performance monitoring interfaces to monitor rendering performance:
493
-
494
- ```typescript
495
- // Get rendering performance statistics
496
- const stats = avatarView.getPerformanceStats()
497
-
498
- if (stats) {
499
- console.log(`Render time: ${stats.renderTime.toFixed(2)}ms`)
500
- console.log(`Sort time: ${stats.sortTime.toFixed(2)}ms`)
501
- console.log(`Rendering backend: ${stats.backend}`)
502
-
503
- // Calculate frame rate
504
- const fps = 1000 / stats.renderTime
505
- console.log(`Frame rate: ${fps.toFixed(2)} FPS`)
506
- }
507
-
508
- // Regular performance monitoring
509
- setInterval(() => {
510
- const stats = avatarView.getPerformanceStats()
511
- if (stats) {
512
- // Send to monitoring service or display on UI
513
- console.log('Performance:', stats)
514
- }
515
- }, 1000)
516
- ```
517
-
518
- **Performance Statistics Description:**
519
- - `renderTime`: Total rendering time (milliseconds), includes sorting and GPU rendering
520
- - `sortTime`: Sorting time (milliseconds), uses Radix Sort algorithm to depth-sort point cloud
521
- - `backend`: Currently used rendering backend (`'webgpu'` | `'webgl'` | `null`)
522
-
523
741
  ## 🚨 Error Handling
524
742
 
525
- ### SPAvatarError
743
+ ### AvatarError
526
744
 
527
745
  The SDK uses custom error types, providing more detailed error information:
528
746
 
529
747
  ```typescript
530
- import { SPAvatarError } from '@spatialwalk/avatarkit'
748
+ import { AvatarError } from '@spatialwalk/avatarkit'
531
749
 
532
750
  try {
533
751
  await avatarView.avatarController.start()
534
752
  } catch (error) {
535
- if (error instanceof SPAvatarError) {
753
+ if (error instanceof AvatarError) {
536
754
  console.error('SDK Error:', error.message, error.code)
537
755
  } else {
538
756
  console.error('Unknown error:', error)
@@ -553,15 +771,12 @@ avatarView.avatarController.onError = (error: Error) => {
553
771
 
554
772
  ### Lifecycle Management
555
773
 
556
- #### Network Mode Lifecycle
774
+ #### SDK Mode Lifecycle
557
775
 
558
776
  ```typescript
559
777
  // Initialize
560
778
  const container = document.getElementById('avatar-container')
561
- const avatarView = new AvatarView(avatar, {
562
- container: container,
563
- playbackMode: AvatarPlaybackMode.network
564
- })
779
+ const avatarView = new AvatarView(avatar, container)
565
780
  await avatarView.avatarController.start()
566
781
 
567
782
  // Use
@@ -572,21 +787,16 @@ avatarView.avatarController.close()
572
787
  avatarView.dispose() // Automatically cleans up all resources
573
788
  ```
574
789
 
575
- #### External Data Mode Lifecycle
790
+ #### Host Mode Lifecycle
576
791
 
577
792
  ```typescript
578
793
  // Initialize
579
794
  const container = document.getElementById('avatar-container')
580
- const avatarView = new AvatarView(avatar, {
581
- container: container,
582
- playbackMode: AvatarPlaybackMode.external
583
- })
795
+ const avatarView = new AvatarView(avatar, container)
584
796
 
585
797
  // Use
586
- const initialAudioChunks = [{ data: audioData1, isLast: false }]
587
- await avatarView.avatarController.play(initialAudioChunks, initialKeyframes)
588
- avatarView.avatarController.sendAudioChunk(audioChunk, false)
589
- avatarView.avatarController.sendKeyframes(keyframes)
798
+ const conversationId = avatarView.avatarController.yieldAudioData(audioChunk, false)
799
+ avatarView.avatarController.yieldFramesData(keyframesDataArray, conversationId) // keyframesDataArray: (Uint8Array | ArrayBuffer)[]
590
800
 
591
801
  // Cleanup
592
802
  avatarView.avatarController.clear() // Clear all data and resources
@@ -594,63 +804,17 @@ avatarView.dispose() // Automatically cleans up all resources
594
804
  ```
595
805
 
596
806
  **⚠️ Important Notes:**
597
- - SDK currently only supports one AvatarView instance at a time
598
- - When switching characters, must first call `dispose()` to clean up old AvatarView, then create new instance
807
+ - When disposing AvatarView instances, must call `dispose()` to properly clean up resources
599
808
  - Not properly cleaning up may cause resource leaks and rendering errors
600
- - In network mode, call `close()` before `dispose()` to properly close WebSocket connections
601
- - In external data mode, call `clear()` before `dispose()` to clear all playback data
809
+ - In SDK mode, call `close()` before `dispose()` to properly close WebSocket connections
810
+ - In Host mode, call `clear()` before `dispose()` to clear all playback data
602
811
 
603
812
  ### Memory Optimization
604
813
 
605
814
  - SDK automatically manages WASM memory allocation
606
- - Supports dynamic loading/unloading of character and animation resources
815
+ - Supports dynamic loading/unloading of avatar and animation resources
607
816
  - Provides memory usage monitoring interface
608
817
 
609
- ### Audio Data Sending
610
-
611
- #### Network Mode
612
-
613
- The `send()` method receives audio data in `ArrayBuffer` format:
614
-
615
- **Audio Format Requirements:**
616
- - **Sample Rate**: 16kHz (16000 Hz) - **Backend requirement, must be exactly 16kHz**
617
- - **Format**: PCM16 (16-bit signed integer, little-endian)
618
- - **Channels**: Mono (single channel)
619
- - **Data Size**: Each sample is 2 bytes, so 1 second of audio = 16000 samples × 2 bytes = 32000 bytes
620
-
621
- **Usage:**
622
- - `audioData`: Audio data (ArrayBuffer format, must be 16kHz mono PCM16)
623
- - `end=false` (default) - Normal audio data sending, server will accumulate audio data, automatically returns animation data and starts synchronized playback of animation and audio after accumulating enough data
624
- - `end=true` - Immediately return animation data, no longer accumulating, used for ending current conversation or scenarios requiring immediate response
625
- - **Important**: No need to wait for `end=true` to start playing, it will automatically start playing after accumulating enough audio data
626
-
627
- #### External Data Mode
628
-
629
- The `play()` method starts playback with initial data, then use `sendAudioChunk()` to stream additional audio:
630
-
631
- **Audio Format Requirements:**
632
- - Same as network mode: 16kHz mono PCM16 format
633
- - Audio data should be provided as `Uint8Array` in chunks with `isLast` flag
634
-
635
- **Usage:**
636
- ```typescript
637
- // Start playback with initial audio and animation data
638
- // Note: Audio and animation data should be obtained from your backend service
639
- const initialAudioChunks = [
640
- { data: audioData1, isLast: false },
641
- { data: audioData2, isLast: false }
642
- ]
643
- await avatarController.play(initialAudioChunks, initialKeyframes)
644
-
645
- // Stream additional audio chunks
646
- avatarController.sendAudioChunk(audioChunk, isLast)
647
- ```
648
-
649
- **Resampling (Both Modes):**
650
- - If your audio source is at a different sample rate (e.g., 24kHz, 48kHz), you **must** resample it to 16kHz before sending
651
- - For high-quality resampling, use Web Audio API's `OfflineAudioContext` with anti-aliasing filtering
652
- - See example projects (`vanilla`, `react`, `vue`) for complete resampling implementation
653
-
654
818
  ## 🌐 Browser Compatibility
655
819
 
656
820
  - **Chrome/Edge** 90+ (WebGPU recommended)
@@ -669,6 +833,5 @@ Issues and Pull Requests are welcome!
669
833
  ## 📞 Support
670
834
 
671
835
  For questions, please contact:
672
- - Email: support@spavatar.com
673
- - Documentation: https://docs.spavatar.com
674
- - GitHub: https://github.com/spavatar/sdk
836
+ - Email: code@spatialwalk.net
837
+ - Documentation: https://docs.spatialreal.ai