@spatialwalk/avatarkit 1.0.0-beta.9 → 1.0.0-beta.90

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (99) hide show
  1. package/CHANGELOG.md +695 -3
  2. package/README.md +650 -370
  3. package/dist/StreamingAudioPlayer-GS4x7i1m.js +638 -0
  4. package/dist/assets/cpu-benchmark-worker-C6iFEUSO.js +36 -0
  5. package/dist/{avatar_core_wasm.wasm → avatar_core_wasm-9834c91c.wasm} +0 -0
  6. package/dist/avatar_core_wasm-BY3MuXDA.js +2696 -0
  7. package/dist/core/Avatar.d.ts +4 -14
  8. package/dist/core/AvatarController.d.ts +104 -111
  9. package/dist/core/AvatarManager.d.ts +32 -12
  10. package/dist/core/AvatarSDK.d.ts +58 -0
  11. package/dist/core/AvatarView.d.ts +86 -132
  12. package/dist/demo/src/main.d.ts +1 -0
  13. package/dist/index-jdCd5L22.js +18087 -0
  14. package/dist/index.d.ts +2 -5
  15. package/dist/index.js +17 -18
  16. package/dist/next.d.ts +2 -0
  17. package/dist/performance/FrameRateMonitor.d.ts +85 -0
  18. package/dist/types/character-settings.d.ts +1 -1
  19. package/dist/types/character.d.ts +42 -16
  20. package/dist/types/index.d.ts +135 -45
  21. package/dist/vite.d.ts +19 -0
  22. package/next.d.ts +3 -0
  23. package/next.js +187 -0
  24. package/package.json +37 -8
  25. package/vite.d.ts +20 -0
  26. package/vite.js +126 -0
  27. package/dist/StreamingAudioPlayer-LW0pGK-E.js +0 -319
  28. package/dist/StreamingAudioPlayer-LW0pGK-E.js.map +0 -1
  29. package/dist/animation/AnimationWebSocketClient.d.ts +0 -50
  30. package/dist/animation/AnimationWebSocketClient.d.ts.map +0 -1
  31. package/dist/animation/utils/eventEmitter.d.ts +0 -13
  32. package/dist/animation/utils/eventEmitter.d.ts.map +0 -1
  33. package/dist/animation/utils/flameConverter.d.ts +0 -26
  34. package/dist/animation/utils/flameConverter.d.ts.map +0 -1
  35. package/dist/audio/AnimationPlayer.d.ts +0 -57
  36. package/dist/audio/AnimationPlayer.d.ts.map +0 -1
  37. package/dist/audio/StreamingAudioPlayer.d.ts +0 -123
  38. package/dist/audio/StreamingAudioPlayer.d.ts.map +0 -1
  39. package/dist/avatar_core_wasm-D4eEi7Eh.js +0 -1666
  40. package/dist/avatar_core_wasm-D4eEi7Eh.js.map +0 -1
  41. package/dist/config/app-config.d.ts +0 -44
  42. package/dist/config/app-config.d.ts.map +0 -1
  43. package/dist/config/constants.d.ts +0 -29
  44. package/dist/config/constants.d.ts.map +0 -1
  45. package/dist/config/sdk-config-loader.d.ts +0 -12
  46. package/dist/config/sdk-config-loader.d.ts.map +0 -1
  47. package/dist/core/Avatar.d.ts.map +0 -1
  48. package/dist/core/AvatarController.d.ts.map +0 -1
  49. package/dist/core/AvatarDownloader.d.ts +0 -100
  50. package/dist/core/AvatarDownloader.d.ts.map +0 -1
  51. package/dist/core/AvatarKit.d.ts +0 -66
  52. package/dist/core/AvatarKit.d.ts.map +0 -1
  53. package/dist/core/AvatarManager.d.ts.map +0 -1
  54. package/dist/core/AvatarView.d.ts.map +0 -1
  55. package/dist/core/NetworkLayer.d.ts +0 -59
  56. package/dist/core/NetworkLayer.d.ts.map +0 -1
  57. package/dist/generated/driveningress/v1/driveningress.d.ts +0 -80
  58. package/dist/generated/driveningress/v1/driveningress.d.ts.map +0 -1
  59. package/dist/generated/driveningress/v2/driveningress.d.ts +0 -81
  60. package/dist/generated/driveningress/v2/driveningress.d.ts.map +0 -1
  61. package/dist/generated/google/protobuf/struct.d.ts +0 -108
  62. package/dist/generated/google/protobuf/struct.d.ts.map +0 -1
  63. package/dist/generated/google/protobuf/timestamp.d.ts +0 -129
  64. package/dist/generated/google/protobuf/timestamp.d.ts.map +0 -1
  65. package/dist/index-8jCKHF1q.js +0 -6033
  66. package/dist/index-8jCKHF1q.js.map +0 -1
  67. package/dist/index.d.ts.map +0 -1
  68. package/dist/index.js.map +0 -1
  69. package/dist/renderer/RenderSystem.d.ts +0 -79
  70. package/dist/renderer/RenderSystem.d.ts.map +0 -1
  71. package/dist/renderer/covariance.d.ts +0 -13
  72. package/dist/renderer/covariance.d.ts.map +0 -1
  73. package/dist/renderer/renderer.d.ts +0 -8
  74. package/dist/renderer/renderer.d.ts.map +0 -1
  75. package/dist/renderer/sortSplats.d.ts +0 -12
  76. package/dist/renderer/sortSplats.d.ts.map +0 -1
  77. package/dist/renderer/webgl/reorderData.d.ts +0 -14
  78. package/dist/renderer/webgl/reorderData.d.ts.map +0 -1
  79. package/dist/renderer/webgl/webglRenderer.d.ts +0 -66
  80. package/dist/renderer/webgl/webglRenderer.d.ts.map +0 -1
  81. package/dist/renderer/webgpu/webgpuRenderer.d.ts +0 -54
  82. package/dist/renderer/webgpu/webgpuRenderer.d.ts.map +0 -1
  83. package/dist/types/character-settings.d.ts.map +0 -1
  84. package/dist/types/character.d.ts.map +0 -1
  85. package/dist/types/index.d.ts.map +0 -1
  86. package/dist/utils/animation-interpolation.d.ts +0 -17
  87. package/dist/utils/animation-interpolation.d.ts.map +0 -1
  88. package/dist/utils/cls-tracker.d.ts +0 -17
  89. package/dist/utils/cls-tracker.d.ts.map +0 -1
  90. package/dist/utils/error-utils.d.ts +0 -27
  91. package/dist/utils/error-utils.d.ts.map +0 -1
  92. package/dist/utils/logger.d.ts +0 -35
  93. package/dist/utils/logger.d.ts.map +0 -1
  94. package/dist/utils/reqId.d.ts +0 -20
  95. package/dist/utils/reqId.d.ts.map +0 -1
  96. package/dist/wasm/avatarCoreAdapter.d.ts +0 -188
  97. package/dist/wasm/avatarCoreAdapter.d.ts.map +0 -1
  98. package/dist/wasm/avatarCoreMemory.d.ts +0 -141
  99. package/dist/wasm/avatarCoreMemory.d.ts.map +0 -1
package/README.md CHANGED
@@ -1,13 +1,12 @@
1
- # SPAvatarKit SDK
1
+ # AvatarKit SDK
2
2
 
3
- Real-time virtual avatar rendering SDK based on 3D Gaussian Splatting, supporting audio-driven animation rendering and high-quality 3D rendering.
3
+ Real-time virtual avatar rendering SDK for Web, supporting audio-driven animation and high-quality 3D rendering.
4
4
 
5
5
  ## 🚀 Features
6
6
 
7
- - **3D Gaussian Splatting Rendering** - Based on the latest point cloud rendering technology, providing high-quality 3D virtual avatars
8
- - **Audio-Driven Real-Time Animation Rendering** - Users provide audio data, SDK handles receiving animation data and rendering
9
- - **WebGPU/WebGL Dual Rendering Backend** - Automatically selects the best rendering backend for compatibility
10
- - **WASM High-Performance Computing** - Uses C++ compiled WebAssembly modules for geometric calculations
7
+ - **High-Quality 3D Rendering** - GPU-accelerated avatar rendering with automatic backend selection
8
+ - **Audio-Driven Real-Time Animation** - Send audio data, SDK handles animation and rendering
9
+ - **Multi-Avatar Support** - Support multiple avatar instances simultaneously, each with independent state and rendering
11
10
  - **TypeScript Support** - Complete type definitions and IntelliSense
12
11
  - **Modular Architecture** - Clear component separation, easy to integrate and extend
13
12
 
@@ -17,366 +16,720 @@ Real-time virtual avatar rendering SDK based on 3D Gaussian Splatting, supportin
17
16
  npm install @spatialwalk/avatarkit
18
17
  ```
19
18
 
19
+ ## 🚧 Release Gate (Hard Rule)
20
+
21
+ Release must pass gates before publish. Do not publish by manual ad-hoc commands.
22
+
23
+ Required gate checks:
24
+
25
+ ```bash
26
+ pnpm typecheck
27
+ pnpm test
28
+ pnpm build
29
+ ./tools/check_perf_baseline_release_gate.sh
30
+ ```
31
+
32
+ If iteration includes bugfixes, `docs/bugfix-history.md` must have completed rows (test mapping + red/green evidence).
33
+
34
+ Hotfix bypass is allowed only for emergency and must be recorded:
35
+
36
+ ```bash
37
+ HOTFIX_BYPASS=1 ./tools/check_perf_baseline_release_gate.sh
38
+ ```
39
+
40
+ ## 🧪 Benchmark Demo (Web SDK)
41
+
42
+ Use the dedicated benchmark demo (independent from `vanilla/`) for perf/render baseline runs:
43
+
44
+ ```bash
45
+ pnpm demo:benchmark
46
+ ```
47
+
48
+ ## 🚀 Demo Repository
49
+
50
+ <div align="center">
51
+
52
+ ### 📌 **Quick Start: Check Out Our Demo Repository**
53
+
54
+ We provide complete example code and best practices to help you quickly integrate the SDK.
55
+
56
+ **The demo repository includes:**
57
+ - ✅ Complete integration examples
58
+ - ✅ Usage examples for both SDK mode and Host mode
59
+ - ✅ Audio processing examples (PCM16, WAV, MP3, etc.)
60
+ - ✅ Vite configuration examples
61
+ - ✅ Next.js configuration examples
62
+ - ✅ Best practices for common scenarios
63
+
64
+ **[👉 View Demo Repository](https://github.com/spatialwalk/avatarkit-demo)** | *If not yet created, please contact the team*
65
+
66
+ </div>
67
+
68
+ ---
69
+
70
+ ## 🔧 Vite Configuration (Recommended)
71
+
72
+ If you are using Vite as your build tool, we strongly recommend using our Vite plugin to automatically handle WASM file configuration. The plugin automatically handles all necessary configurations, so you don't need to set them up manually.
73
+
74
+ ### Using the Plugin
75
+
76
+ Add the plugin to `vite.config.ts`:
77
+
78
+ ```typescript
79
+ import { defineConfig } from 'vite'
80
+ import { avatarkitVitePlugin } from '@spatialwalk/avatarkit/vite'
81
+
82
+ export default defineConfig({
83
+ plugins: [
84
+ avatarkitVitePlugin(), // Just add this line
85
+ ],
86
+ })
87
+ ```
88
+
89
+ ### Plugin Features
90
+
91
+ The plugin automatically handles:
92
+
93
+ - ✅ **Development Server**: Automatically sets the correct MIME type (`application/wasm`) for WASM files
94
+ - ✅ **Build Time**: Automatically copies WASM files to `dist/assets/` directory
95
+ - ✅ **Cloudflare Pages**: Automatically generates `_headers` file to ensure WASM files use the correct MIME type
96
+ - ✅ **Vite Configuration**: Automatically configures `optimizeDeps`, `assetsInclude`, `assetsInlineLimit`, and other options
97
+
98
+ ### Manual Configuration (Without Plugin)
99
+
100
+ If you don't use the Vite plugin, you need to manually configure the following:
101
+
102
+ ```typescript
103
+ // vite.config.ts
104
+ export default defineConfig({
105
+ optimizeDeps: {
106
+ exclude: ['@spatialwalk/avatarkit'],
107
+ },
108
+ assetsInclude: ['**/*.wasm'],
109
+ build: {
110
+ assetsInlineLimit: 0,
111
+ rollupOptions: {
112
+ output: {
113
+ assetFileNames: (assetInfo) => {
114
+ if (assetInfo.name?.endsWith('.wasm')) {
115
+ return 'assets/[name][extname]'
116
+ }
117
+ return 'assets/[name]-[hash][extname]'
118
+ },
119
+ },
120
+ },
121
+ },
122
+ // Development server needs to manually configure middleware to set WASM MIME type
123
+ configureServer(server) {
124
+ server.middlewares.use((req, res, next) => {
125
+ if (req.url?.endsWith('.wasm')) {
126
+ res.setHeader('Content-Type', 'application/wasm')
127
+ }
128
+ next()
129
+ })
130
+ },
131
+ })
132
+ ```
133
+
134
+ ## 🔧 Next.js Configuration
135
+
136
+ For Next.js projects, use the `withAvatarkit` wrapper to automatically handle WASM file configuration with webpack.
137
+
138
+ ### Using the Plugin
139
+
140
+ Wrap your Next.js config in `next.config.mjs`:
141
+
142
+ ```javascript
143
+ import { withAvatarkit } from '@spatialwalk/avatarkit/next'
144
+
145
+ export default withAvatarkit({
146
+ // ...your existing Next.js config
147
+ })
148
+ ```
149
+
150
+ ### Plugin Features
151
+
152
+ The plugin automatically handles:
153
+
154
+ - ✅ **Path Fix**: Patches asset path resolution so WASM files are correctly loaded at `/_next/static/chunks/`
155
+ - ✅ **WASM Copying**: Copies `.wasm` files into `static/chunks/` via a custom webpack plugin (client build only)
156
+ - ✅ **Content-Type Headers**: Adds `application/wasm` response header for `/_next/static/chunks/*.wasm`
157
+ - ✅ **Config Chaining**: Preserves your existing `webpack` and `headers` configurations
158
+
159
+ ## 🔐 Authentication
160
+
161
+ All environments require an **App ID** and **Session Token** for authentication.
162
+
163
+ ### App ID
164
+
165
+ The App ID is used to identify your application. You can obtain your App ID by:
166
+
167
+ 1. **For Testing**: Use the default test App ID provided in demo repositories (paired with test Session Token, only works with publicly available test avatars like Rohan, Dr.Kellan, Priya, Josh, etc.)
168
+ 2. **For Production**: Visit the [Developer Platform](https://dash.spatialreal.ai) to create your own App and avatars. You will receive your own App ID after creating an App.
169
+
170
+ ### Session Token
171
+
172
+ The Session Token is required for authentication and must be obtained from your SDK provider.
173
+
174
+ **⚠️ Important Notes:**
175
+ - The Session Token must be valid and not expired
176
+ - In production applications, you **must** manually inject a valid Session Token obtained from your SDK provider
177
+ - The default Session Token provided in demo repositories is **only for demonstration purposes** and can only be used with test avatars
178
+ - If you want to create your own avatars and test them, please visit the [Developer Platform](https://dash.spatialreal.ai) to create your own App and generate Session Tokens
179
+
180
+ **How to Set Session Token:**
181
+
182
+ ```typescript
183
+ // Initialize SDK with App ID
184
+ await AvatarSDK.initialize('your-app-id', configuration)
185
+
186
+ // Set Session Token (can be called before or after initialization)
187
+ // If called before initialization, the token will be automatically set when you initialize the SDK
188
+ AvatarSDK.setSessionToken('your-session-token')
189
+
190
+ // Get current Session Token
191
+ const sessionToken = AvatarSDK.sessionToken
192
+ ```
193
+
194
+ **Token Management:**
195
+ - The Session Token can be set at any time using `AvatarSDK.setSessionToken(token)`
196
+ - If you set the token before initializing the SDK, it will be automatically applied during initialization
197
+ - If you set the token after initialization, it will be applied immediately
198
+ - Handle token refresh logic in your application as needed (e.g., when token expires)
199
+
200
+ **For Production Integration:**
201
+ - Obtain a valid Session Token from your SDK provider
202
+ - Store the token securely (never expose it in client-side code if possible)
203
+ - Implement token refresh logic to handle token expiration
204
+ - Use `AvatarSDK.setSessionToken(token)` to inject the token programmatically
205
+
20
206
  ## 🎯 Quick Start
21
207
 
208
+ ### ⚠️ Important: Audio Context Initialization
209
+
210
+ **Before using any audio-related features, you MUST initialize the audio context in a user gesture context** (e.g., `click`, `touchstart` event handlers). This is required by browser security policies. Calling `initializeAudioContext()` outside a user gesture will fail.
211
+
22
212
  ### Basic Usage
23
213
 
24
214
  ```typescript
25
215
  import {
26
- AvatarKit,
216
+ AvatarSDK,
27
217
  AvatarManager,
28
218
  AvatarView,
29
219
  Configuration,
30
- Environment
220
+ Environment,
221
+ DrivingServiceMode,
222
+ LogLevel
31
223
  } from '@spatialwalk/avatarkit'
32
224
 
33
225
  // 1. Initialize SDK
226
+
34
227
  const configuration: Configuration = {
35
- environment: Environment.test,
228
+ environment: Environment.cn,
229
+ drivingServiceMode: DrivingServiceMode.sdk, // Optional, 'sdk' is default
230
+ // - DrivingServiceMode.sdk: SDK mode - SDK handles network communication
231
+ // - DrivingServiceMode.host: Host mode - Host app provides audio and animation data
232
+ logLevel: LogLevel.off, // Optional, 'off' is default
233
+ // - LogLevel.off: Disable all logs
234
+ // - LogLevel.error: Only error logs
235
+ // - LogLevel.warning: Warning and error logs
236
+ // - LogLevel.all: All logs (info, warning, error)
237
+ audioFormat: { // Optional, default is { channelCount: 1, sampleRate: 16000 }
238
+ channelCount: 1, // Fixed to 1 (mono)
239
+ sampleRate: 16000 // Supported: 8000, 16000, 22050, 24000, 32000, 44100, 48000 Hz
240
+ }
241
+ // characterApiBaseUrl: 'https://custom-api.example.com' // Optional, internal debug config, can be ignored
36
242
  }
37
243
 
38
- await AvatarKit.initialize('your-app-id', configuration)
244
+ await AvatarSDK.initialize('your-app-id', configuration)
39
245
 
40
- // Set sessionToken (if needed, call separately)
41
- // AvatarKit.setSessionToken('your-session-token')
246
+ // Set Session Token (required for authentication)
247
+ // You must obtain a valid Session Token from your SDK provider
248
+ // See Authentication section above for more details
249
+ AvatarSDK.setSessionToken('your-session-token')
42
250
 
43
- // 2. Load character
44
- const avatarManager = new AvatarManager()
251
+ // 2. Load avatar
252
+ const avatarManager = AvatarManager.shared
45
253
  const avatar = await avatarManager.load('character-id', (progress) => {
46
254
  console.log(`Loading progress: ${progress.progress}%`)
47
255
  })
48
256
 
49
257
  // 3. Create view (automatically creates Canvas and AvatarController)
50
- // Network mode (default)
258
+ // The playback mode is determined by drivingServiceMode in AvatarSDK configuration
259
+ // - DrivingServiceMode.sdk: SDK mode - SDK handles network communication
260
+ // - DrivingServiceMode.host: Host mode - Host app provides audio and animation data
51
261
  const container = document.getElementById('avatar-container')
52
- const avatarView = new AvatarView(avatar, {
53
- container: container,
54
- playbackMode: 'network' // Optional, 'network' is default
262
+ const avatarView = new AvatarView(avatar, container)
263
+
264
+ // 4. ⚠️ CRITICAL: Initialize audio context (MUST be called in user gesture context)
265
+ // This method MUST be called within a user gesture event handler (click, touchstart, etc.)
266
+ // to satisfy browser security policies. Calling it outside a user gesture will fail.
267
+ button.addEventListener('click', async () => {
268
+ // Initialize audio context - MUST be in user gesture context
269
+ await avatarView.controller.initializeAudioContext()
270
+
271
+ // 5. Start real-time communication (SDK mode only)
272
+ await avatarView.controller.start()
273
+
274
+ // 6. Send audio data (SDK mode, must be mono PCM16 format matching configured sample rate)
275
+ // audioData: ArrayBuffer or Uint8Array containing PCM16 audio samples
276
+ // - PCM files: Can be directly read as ArrayBuffer
277
+ // - WAV files: Extract PCM data from WAV format (may require resampling)
278
+ // - MP3 files: Decode first (e.g., using AudioContext.decodeAudioData()), then convert to PCM16
279
+ const audioData = new ArrayBuffer(1024) // Placeholder: Replace with actual PCM16 audio data
280
+ avatarView.controller.send(audioData, false) // Send audio data
281
+ avatarView.controller.send(audioData, true) // end=true marks the end of current conversation round
55
282
  })
56
-
57
- // 4. Start real-time communication (network mode only)
58
- await avatarView.avatarController.start()
59
-
60
- // 5. Send audio data (network mode)
61
- // ⚠️ Important: Audio must be 16kHz mono PCM16 format
62
- // If audio is Uint8Array, you can use slice().buffer to convert to ArrayBuffer
63
- const audioUint8 = new Uint8Array(1024) // Example: 16kHz PCM16 audio data (512 samples = 1024 bytes)
64
- const audioData = audioUint8.slice().buffer // Simplified conversion, works for ArrayBuffer and SharedArrayBuffer
65
- avatarView.avatarController.send(audioData, false) // Send audio data, will automatically start playing after accumulating enough data
66
- avatarView.avatarController.send(audioData, true) // end=true means immediately return animation data, no longer accumulating
67
283
  ```
68
284
 
69
- ### External Data Mode Example
285
+ ### Host Mode Example
70
286
 
71
287
  ```typescript
72
- import { AvatarPlaybackMode } from '@spatialwalk/avatarkit'
73
288
 
74
- // 1-3. Same as network mode (initialize SDK, load character)
289
+ // 1-3. Same as SDK mode (initialize SDK, load avatar)
75
290
 
76
- // 3. Create view with external data mode
291
+ // 3. Create view with Host mode
77
292
  const container = document.getElementById('avatar-container')
78
- const avatarView = new AvatarView(avatar, {
79
- container: container,
80
- playbackMode: AvatarPlaybackMode.external
81
- })
82
-
83
- // 4. Start playback with initial data (obtained from your service)
84
- // Note: Audio and animation data should be obtained from your backend service
85
- const initialAudioChunks = [{ data: audioData1, isLast: false }, { data: audioData2, isLast: false }]
86
- const initialKeyframes = animationData1 // Animation keyframes from your service
87
-
88
- await avatarView.avatarController.play(initialAudioChunks, initialKeyframes)
89
-
90
- // 5. Stream additional data as needed
91
- avatarView.avatarController.sendAudioChunk(audioData3, false)
92
- avatarView.avatarController.sendKeyframes(animationData2)
293
+ const avatarView = new AvatarView(avatar, container)
294
+
295
+ // 4. ⚠️ CRITICAL: Initialize audio context (MUST be called in user gesture context)
296
+ // This method MUST be called within a user gesture event handler (click, touchstart, etc.)
297
+ // to satisfy browser security policies. Calling it outside a user gesture will fail.
298
+ button.addEventListener('click', async () => {
299
+ // Initialize audio context - MUST be in user gesture context
300
+ await avatarView.controller.initializeAudioContext()
301
+
302
+ // 5. Host Mode Workflow:
303
+ // Send audio data first to get conversationId, then use it to send animation data
304
+ const conversationId = avatarView.controller.yieldAudioData(audioData, false)
305
+ avatarView.controller.yieldFramesData(animationDataArray, conversationId) // animationDataArray: (Uint8Array | ArrayBuffer)[]
93
306
  ```
94
307
 
95
308
  ### Complete Examples
96
309
 
97
- Check the example code in the GitHub repository for complete usage flows for both modes.
98
-
99
- **Example Project:** [Avatarkit-web-demo](https://github.com/spatialwalk/Avatarkit-web-demo)
100
-
101
- This repository contains complete examples for Vanilla JS, Vue 3, and React, demonstrating:
102
- - Network mode: Real-time audio input with automatic animation data reception
103
- - External data mode: Custom data sources with manual audio/animation data management
310
+ This SDK supports two usage modes:
311
+ - SDK mode: Real-time audio input with automatic animation data reception
312
+ - Host mode: Custom data sources with manual audio/animation data management
104
313
 
105
314
  ## 🏗️ Architecture Overview
106
315
 
107
- ### Three-Layer Architecture
108
-
109
- The SDK uses a three-layer architecture for clear separation of concerns:
110
-
111
- 1. **Rendering Layer (AvatarView)** - Responsible for 3D rendering only
112
- 2. **Playback Layer (AvatarController)** - Manages audio/animation synchronization and playback
113
- 3. **Network Layer (NetworkLayer)** - Handles WebSocket communication (only in network mode)
114
-
115
316
  ### Core Components
116
317
 
117
- - **AvatarKit** - SDK initialization and management
118
- - **AvatarManager** - Character resource loading and management
119
- - **AvatarView** - 3D rendering view (rendering layer)
120
- - **AvatarController** - Audio/animation playback controller (playback layer)
121
- - **NetworkLayer** - WebSocket communication (network layer, automatically composed in network mode)
122
- - **AvatarCoreAdapter** - WASM module adapter
318
+ - **AvatarSDK** - SDK initialization and management
319
+ - **AvatarManager** - Avatar resource loading and management
320
+ - **AvatarView** - 3D rendering view
321
+ - **AvatarController** - Audio/animation playback controller
123
322
 
124
323
  ### Playback Modes
125
324
 
126
- The SDK supports two playback modes, configured when creating `AvatarView`:
325
+ The SDK supports two playback modes, configured in `AvatarSDK.initialize()`:
127
326
 
128
- #### 1. Network Mode (Default)
129
- - SDK handles WebSocket communication automatically
327
+ #### 1. SDK Mode (Default)
328
+ - Configured via `drivingServiceMode: DrivingServiceMode.sdk` in `AvatarSDK.initialize()`
329
+ - SDK handles network communication automatically
130
330
  - Send audio data via `AvatarController.send()`
131
331
  - SDK receives animation data from backend and synchronizes playback
132
332
  - Best for: Real-time audio input scenarios
133
333
 
134
- #### 2. External Data Mode
135
- - External components manage their own network/data fetching
136
- - External components provide both audio and animation data
334
+ #### 2. Host Mode
335
+ - Configured via `drivingServiceMode: DrivingServiceMode.host` in `AvatarSDK.initialize()`
336
+ - Host application manages its own network/data fetching
337
+ - Host application provides both audio and animation data
137
338
  - SDK only handles synchronized playback
138
339
  - Best for: Custom data sources, pre-recorded content, or custom network implementations
139
340
 
341
+ **Note:** The playback mode is determined by `drivingServiceMode` in `AvatarSDK.initialize()` configuration.
342
+
343
+ ### Fallback Mechanism
344
+
345
+ The SDK includes a fallback mechanism to ensure audio playback continues even when animation data is unavailable:
346
+
347
+ - **SDK Mode Connection Failure**: If connection fails to establish within 15 seconds, the SDK automatically enters fallback mode. Audio data can still be sent and will play normally, even though no animation data will be received. This ensures audio playback is not interrupted.
348
+ - **SDK Mode Server Error**: If the server returns an error after connection is established, the SDK automatically enters audio-only mode for that session.
349
+ - **Host Mode**: If empty animation data is provided (empty array or undefined), the SDK automatically enters audio-only mode.
350
+ - Once in audio-only mode, any subsequent animation data for that session will be ignored, and only audio will continue playing.
351
+ - The fallback mode is interruptible, just like normal playback mode.
352
+ - Connection state callbacks (`onConnectionState`) will notify you when connection fails or times out.
353
+
140
354
  ### Data Flow
141
355
 
142
- #### Network Mode Flow
356
+ #### SDK Mode Flow
143
357
 
144
358
  ```
145
- User audio input (16kHz mono PCM16)
146
-
147
- AvatarController.send()
359
+ Audio input (PCM16 mono)
148
360
 
149
- NetworkLayer → WebSocket → Backend processing
361
+ AvatarController.send()
150
362
 
151
- Backend returns animation data (FLAME keyframes)
363
+ Backend processing Animation data
152
364
 
153
- NetworkLayer AvatarController AnimationPlayer
365
+ SDK synchronizes audio + animation playback
154
366
 
155
- FLAME parametersAvatarCore.computeFrameFlatFromParams() → Splat data
156
-
157
- AvatarController (playback loop) → AvatarView.renderRealtimeFrame()
158
-
159
- RenderSystem → WebGPU/WebGL → Canvas rendering
367
+ GPU renderingCanvas
160
368
  ```
161
369
 
162
- #### External Data Mode Flow
370
+ #### Host Mode Flow
163
371
 
164
372
  ```
165
373
  External data source (audio + animation)
166
374
 
167
- AvatarController.play(initialAudio, initialKeyframes) // Start playback
168
-
169
- AvatarController.sendAudioChunk() // Stream additional audio
170
- AvatarController.sendKeyframes() // Stream additional animation
171
-
172
- AvatarController → AnimationPlayer (synchronized playback)
375
+ AvatarController.yieldAudioData(audioChunk) returns conversationId
376
+ AvatarController.yieldFramesData(dataArray, conversationId)
173
377
 
174
- FLAME parameters AvatarCore.computeFrameFlatFromParams() Splat data
378
+ SDK synchronizes audio + animation playback
175
379
 
176
- AvatarController (playback loop) AvatarView.renderRealtimeFrame()
177
-
178
- RenderSystem → WebGPU/WebGL → Canvas rendering
380
+ GPU renderingCanvas
179
381
  ```
180
382
 
181
- **Note:**
182
- - In network mode, users provide audio data, SDK handles network communication and animation data reception
183
- - In external data mode, users provide both audio and animation data, SDK handles synchronized playback only
184
-
185
383
  ### Audio Format Requirements
186
384
 
187
- **⚠️ Important:** The SDK requires audio data to be in **16kHz mono PCM16** format:
385
+ **⚠️ Important:** The SDK requires audio data to be in **mono PCM16** format:
188
386
 
189
- - **Sample Rate**: 16kHz (16000 Hz) - This is a backend requirement
190
- - **Channels**: Mono (single channel)
387
+ - **Sample Rate**: Configurable via `audioFormat.sampleRate` in SDK initialization (default: 16000 Hz)
388
+ - Supported sample rates: 8000, 16000, 22050, 24000, 32000, 44100, 48000 Hz
389
+ - The configured sample rate will be used for both audio recording and playback
390
+ - **Channels**: Mono (single channel) - Fixed to 1 channel
191
391
  - **Format**: PCM16 (16-bit signed integer, little-endian)
192
392
  - **Byte Order**: Little-endian
193
393
 
194
394
  **Audio Data Format:**
195
- - Each sample is 2 bytes (16-bit)
395
+ - Each sample is 2 bytes (16-bit signed integer, little-endian)
196
396
  - Audio data should be provided as `ArrayBuffer` or `Uint8Array`
197
- - For example: 1 second of audio = 16000 samples × 2 bytes = 32000 bytes
397
+ - For example, with 16kHz sample rate: 1 second of audio = 16000 samples × 2 bytes = 32000 bytes
398
+ - For 48kHz sample rate: 1 second of audio = 48000 samples × 2 bytes = 96000 bytes
399
+
400
+ **Audio Data Source:**
401
+ The `audioData` parameter represents raw PCM16 audio samples in the configured sample rate and mono format. Common audio sources include:
402
+ - **PCM files**: Raw PCM16 files can be directly read as `ArrayBuffer` or `Uint8Array` and sent to the SDK (ensure sample rate matches configuration)
403
+ - **WAV files**: WAV files contain PCM16 audio data in their data chunk. After extracting the PCM data from the WAV file format, it can be sent to the SDK (may require resampling if sample rate differs)
404
+ - **MP3 files**: MP3 files need to be decoded first (e.g., using `AudioContext.decodeAudioData()` or a decoder library), then converted from the decoded format to PCM16 before sending to the SDK
405
+ - **Microphone input**: Real-time microphone audio needs to be captured and converted to PCM16 format at the configured sample rate before sending
406
+ - **Other audio sources**: Any audio source must be converted to mono PCM16 format at the configured sample rate before sending
407
+
408
+ **Example: Processing WAV and MP3 Files:**
409
+ ```typescript
410
+ // WAV file processing
411
+ async function processWAVFile(wavFile: File): Promise<ArrayBuffer> {
412
+ const arrayBuffer = await wavFile.arrayBuffer()
413
+ const view = new DataView(arrayBuffer)
414
+
415
+ // WAV format: Skip header (usually 44 bytes for standard WAV)
416
+ // Check RIFF header
417
+ if (view.getUint32(0, true) !== 0x46464952) { // "RIFF"
418
+ throw new Error('Invalid WAV file')
419
+ }
420
+
421
+ // Find "data" chunk (offset may vary)
422
+ let dataOffset = 44 // Standard WAV header size
423
+ // For non-standard WAV files, you may need to search for "data" chunk
424
+ // This is a simplified example - production code should parse chunks properly
425
+
426
+ const pcmData = arrayBuffer.slice(dataOffset)
427
+ return pcmData
428
+ }
429
+
430
+ // MP3 file processing
431
+ async function processMP3File(mp3File: File, targetSampleRate: number): Promise<ArrayBuffer> {
432
+ const arrayBuffer = await mp3File.arrayBuffer()
433
+ const audioContext = new AudioContext({ sampleRate: targetSampleRate })
434
+
435
+ // Decode MP3 to AudioBuffer
436
+ const audioBuffer = await audioContext.decodeAudioData(arrayBuffer.slice(0))
437
+
438
+ // Convert AudioBuffer to PCM16 ArrayBuffer
439
+ const length = audioBuffer.length
440
+ const channels = audioBuffer.numberOfChannels
441
+ const pcm16Buffer = new ArrayBuffer(length * 2)
442
+ const pcm16View = new DataView(pcm16Buffer)
443
+
444
+ // Mix down to mono if stereo
445
+ const sourceData = channels === 1
446
+ ? audioBuffer.getChannelData(0)
447
+ : new Float32Array(length)
448
+
449
+ if (channels > 1) {
450
+ const leftChannel = audioBuffer.getChannelData(0)
451
+ const rightChannel = audioBuffer.getChannelData(1)
452
+ for (let i = 0; i < length; i++) {
453
+ sourceData[i] = (leftChannel[i] + rightChannel[i]) / 2 // Mix to mono
454
+ }
455
+ }
456
+
457
+ // Convert float32 (-1.0 to 1.0) to int16 (-32768 to 32767)
458
+ for (let i = 0; i < length; i++) {
459
+ const sample = Math.max(-1, Math.min(1, sourceData[i])) // Clamp
460
+ const int16Sample = sample < 0 ? sample * 0x8000 : sample * 0x7FFF
461
+ pcm16View.setInt16(i * 2, int16Sample, true) // little-endian
462
+ }
463
+
464
+ audioContext.close()
465
+ return pcm16Buffer
466
+ }
467
+
468
+ // Usage example:
469
+ // const wavPcmData = await processWAVFile(wavFile)
470
+ // avatarView.controller.send(wavPcmData, false)
471
+ //
472
+ // const mp3PcmData = await processMP3File(mp3File, 16000) // 16kHz
473
+ // avatarView.controller.send(mp3PcmData, false)
474
+ ```
198
475
 
199
476
  **Resampling:**
200
- - If your audio source is at a different sample rate (e.g., 24kHz, 48kHz), you must resample it to 16kHz before sending to the SDK
477
+ - If your audio source is at a different sample rate, you must resample it to match the configured sample rate before sending to the SDK
201
478
  - For high-quality resampling, we recommend using Web Audio API's `OfflineAudioContext` with anti-aliasing filtering
202
479
  - See example projects for resampling implementation
203
480
 
481
+ **Configuration Example:**
482
+ ```typescript
483
+ const configuration: Configuration = {
484
+ environment: Environment.cn,
485
+ audioFormat: {
486
+ channelCount: 1, // Fixed to 1 (mono)
487
+ sampleRate: 48000 // Choose from: 8000, 16000, 22050, 24000, 32000, 44100, 48000
488
+ }
489
+ }
490
+ ```
491
+
204
492
  ## 📚 API Reference
205
493
 
206
- ### AvatarKit
494
+ ### AvatarSDK
207
495
 
208
496
  The core management class of the SDK, responsible for initialization and global configuration.
209
497
 
210
498
  ```typescript
211
499
  // Initialize SDK
212
- await AvatarKit.initialize(appId: string, configuration: Configuration)
500
+ await AvatarSDK.initialize(appId: string, configuration: Configuration)
213
501
 
214
502
  // Check initialization status
215
- const isInitialized = AvatarKit.isInitialized
503
+ const isInitialized = AvatarSDK.isInitialized
504
+
505
+ // Get initialized app ID
506
+ const appId = AvatarSDK.appId
507
+
508
+ // Get configuration
509
+ const config = AvatarSDK.configuration
510
+
511
+ // Set Session Token (required for authentication)
512
+ // You must obtain a valid Session Token from your SDK provider
513
+ // See Authentication section for more details
514
+ AvatarSDK.setSessionToken('your-session-token')
515
+
516
+ // Set userId (optional, for telemetry)
517
+ AvatarSDK.setUserId('user-id')
518
+
519
+ // Get sessionToken
520
+ const sessionToken = AvatarSDK.sessionToken
521
+
522
+ // Get userId
523
+ const userId = AvatarSDK.userId
524
+
525
+ // Get SDK version
526
+ const version = AvatarSDK.version
216
527
 
217
528
  // Cleanup resources (must be called when no longer in use)
218
- AvatarKit.cleanup()
529
+ AvatarSDK.cleanup()
219
530
  ```
220
531
 
221
532
  ### AvatarManager
222
533
 
223
- Character resource manager, responsible for downloading, caching, and loading character data.
534
+ Avatar resource manager, responsible for downloading, caching, and loading avatar data. Use the singleton instance via `AvatarManager.shared`.
224
535
 
225
536
  ```typescript
226
- const manager = new AvatarManager()
537
+ // Get singleton instance
538
+ const manager = AvatarManager.shared
227
539
 
228
- // Load character
540
+ // Load avatar
229
541
  const avatar = await manager.load(
230
- characterId: string,
542
+ id: string,
231
543
  onProgress?: (progress: LoadProgressInfo) => void
232
544
  )
233
545
 
234
546
  // Clear cache
235
- manager.clearCache()
547
+ manager.clearAll()
236
548
  ```
237
549
 
238
550
  ### AvatarView
239
551
 
240
- 3D rendering view (rendering layer), responsible for 3D rendering only. Internally automatically creates and manages `AvatarController`.
552
+ 3D rendering view, responsible for 3D rendering only. Internally automatically creates and manages `AvatarController`.
553
+
554
+ ```typescript
555
+ constructor(avatar: Avatar, container: HTMLElement)
556
+ ```
241
557
 
242
- **⚠️ Important Limitation:** Currently, the SDK only supports one AvatarView instance at a time. If you need to switch characters, you must first call the `dispose()` method to clean up the current AvatarView, then create a new instance.
558
+ **Parameters:**
559
+ - `avatar`: Avatar instance
560
+ - `container`: Canvas container element (required)
561
+ - Canvas automatically uses the full size of the container (width and height)
562
+ - Canvas aspect ratio adapts to container size - set container size to control aspect ratio
563
+ - Canvas will be automatically added to the container
564
+ - SDK automatically handles resize events via ResizeObserver
243
565
 
244
- **Playback Mode Configuration:**
566
+ **Playback Mode:**
567
+ - The playback mode is determined by `drivingServiceMode` in `AvatarSDK.initialize()` configuration
245
568
  - The playback mode is fixed when creating `AvatarView` and persists throughout its lifecycle
246
569
  - Cannot be changed after creation
247
570
 
248
571
  ```typescript
249
- import { AvatarPlaybackMode } from '@spatialwalk/avatarkit'
250
-
251
572
  // Create view (Canvas is automatically added to container)
252
- // Network mode (default)
253
573
  const container = document.getElementById('avatar-container')
254
- const avatarView = new AvatarView(avatar: Avatar, {
255
- container: container,
256
- playbackMode: AvatarPlaybackMode.network // Optional, default is 'network'
257
- })
258
-
259
- // External data mode
260
- const avatarView = new AvatarView(avatar: Avatar, {
261
- container: container,
262
- playbackMode: AvatarPlaybackMode.external
263
- })
574
+ const avatarView = new AvatarView(avatar, container)
264
575
 
265
- // Get Canvas element
266
- const canvas = avatarView.getCanvas()
576
+ // Wait for first frame to render
577
+ avatarView.onFirstRendering = () => {
578
+ // First frame rendered
579
+ }
267
580
 
268
- // Get playback mode
269
- const mode = avatarView.playbackMode // 'network' | 'external'
581
+ // Get or set avatar transform (position and scale)
582
+ // Get current transform
583
+ const currentTransform = avatarView.transform // { x: number, y: number, scale: number }
270
584
 
271
- // Update camera configuration
272
- avatarView.updateCameraConfig(cameraConfig: CameraConfig)
585
+ // Set transform
586
+ avatarView.transform = { x, y, scale }
587
+ // - x: Horizontal offset in normalized coordinates (-1 to 1, where -1 = left edge, 0 = center, 1 = right edge)
588
+ // - y: Vertical offset in normalized coordinates (-1 to 1, where -1 = bottom edge, 0 = center, 1 = top edge)
589
+ // - scale: Scale factor (1.0 = original size, 2.0 = double size, 0.5 = half size)
273
590
 
274
- // Cleanup resources (must be called before switching characters)
591
+ // Cleanup resources (must be called before switching avatars)
275
592
  avatarView.dispose()
276
593
  ```
277
594
 
278
- **Character Switching Example:**
595
+ **Avatar Switching Example:**
279
596
 
280
597
  ```typescript
281
- // Before switching characters, must clean up old AvatarView first
598
+ // To switch avatars, simply dispose the old view and create a new one
282
599
  if (currentAvatarView) {
283
600
  currentAvatarView.dispose()
284
- currentAvatarView = null
285
601
  }
286
602
 
287
- // Load new character
603
+ // Load new avatar
288
604
  const newAvatar = await avatarManager.load('new-character-id')
289
605
 
290
- // Create new AvatarView (with same or different playback mode)
291
- currentAvatarView = new AvatarView(newAvatar, {
292
- container: container,
293
- playbackMode: AvatarPlaybackMode.network
294
- })
606
+ // Create new AvatarView
607
+ currentAvatarView = new AvatarView(newAvatar, container)
295
608
 
296
- // Network mode: start connection
297
- if (currentAvatarView.playbackMode === AvatarPlaybackMode.network) {
298
- await currentAvatarView.avatarController.start()
299
- }
609
+ // SDK mode: start connection (will throw error if not in SDK mode)
610
+ await currentAvatarView.controller.start()
300
611
  ```
301
612
 
302
613
  ### AvatarController
303
614
 
304
- Audio/animation playback controller (playback layer), manages synchronized playback of audio and animation. Automatically composes `NetworkLayer` in network mode.
615
+ Audio/animation playback controller, manages synchronized playback of audio and animation. Automatically handles network communication in SDK mode.
305
616
 
306
617
  **Two Usage Patterns:**
307
618
 
308
- #### Network Mode Methods
619
+ #### SDK Mode Methods
309
620
 
310
621
  ```typescript
311
- // Start WebSocket service
312
- await avatarView.avatarController.start()
313
-
314
- // Send audio data (SDK handles receiving animation data automatically)
315
- avatarView.avatarController.send(audioData: ArrayBuffer, end: boolean)
316
- // audioData: Audio data (ArrayBuffer format, must be 16kHz mono PCM16)
317
- // - Sample rate: 16kHz (16000 Hz) - backend requirement
318
- // - Format: PCM16 (16-bit signed integer, little-endian)
319
- // - Channels: Mono (single channel)
320
- // - Example: 1 second = 16000 samples × 2 bytes = 32000 bytes
321
- // end: false (default) - Normal audio data sending, server will accumulate audio data, automatically returns animation data and starts synchronized playback of animation and audio after accumulating enough data
322
- // end: true - Immediately return animation data, no longer accumulating, used for ending current conversation or scenarios requiring immediate response
622
+ // ⚠️ CRITICAL: Initialize audio context first (MUST be called in user gesture context)
623
+ // This method MUST be called within a user gesture event handler (click, touchstart, etc.)
624
+ // to satisfy browser security policies. Calling it outside a user gesture will fail.
625
+ // All audio operations (start, send, etc.) require prior initialization.
626
+ button.addEventListener('click', async () => {
627
+ // Initialize audio context - MUST be in user gesture context
628
+ await avatarView.controller.initializeAudioContext()
629
+
630
+ // Start service
631
+ await avatarView.controller.start()
632
+
633
+ // Send audio data (must be mono PCM16 format matching configured sample rate)
634
+ const conversationId = avatarView.controller.send(audioData: ArrayBuffer, end: boolean)
635
+ // Returns: conversationId - Conversation ID for this conversation session
636
+ // end: false (default) - Continue sending audio data for current conversation
637
+ // end: true - Mark the end of current conversation round. After end=true, sending new audio data will interrupt any ongoing playback from the previous conversation round
638
+ })
323
639
 
324
- // Close WebSocket service
325
- avatarView.avatarController.close()
640
+ // Close service
641
+ avatarView.controller.close()
326
642
  ```
327
643
 
328
- #### External Data Mode Methods
644
+ #### Host Mode Methods
329
645
 
330
646
  ```typescript
331
- // Start playback with initial audio and animation data
332
- await avatarView.avatarController.play(
333
- initialAudioChunks?: Array<{ data: Uint8Array, isLast: boolean }>, // Initial audio chunks (16kHz mono PCM16)
334
- initialKeyframes?: any[] // Initial animation keyframes (obtained from your service)
335
- )
647
+ // ⚠️ CRITICAL: Initialize audio context first (MUST be called in user gesture context)
648
+ // This method MUST be called within a user gesture event handler (click, touchstart, etc.)
649
+ // to satisfy browser security policies. Calling it outside a user gesture will fail.
650
+ // All audio operations (yieldAudioData, yieldFramesData, etc.) require prior initialization.
651
+ button.addEventListener('click', async () => {
652
+ // Initialize audio context - MUST be in user gesture context
653
+ await avatarView.controller.initializeAudioContext()
654
+
655
+ // Stream audio chunks (must be mono PCM16 format matching configured sample rate)
656
+ const conversationId = avatarView.controller.yieldAudioData(
657
+ data: Uint8Array, // Audio chunk data (PCM16 format)
658
+ isLast: boolean = false // Whether this is the last chunk
659
+ )
660
+ // Returns: conversationId - Conversation ID for this audio session
661
+
662
+ // Stream animation keyframes (requires conversationId from audio data)
663
+ avatarView.controller.yieldFramesData(
664
+ keyframesDataArray: (Uint8Array | ArrayBuffer)[], // Animation keyframes binary data array
665
+ conversationId: string // Conversation ID (required)
666
+ )
667
+ })
668
+ ```
336
669
 
337
- // Stream additional audio chunks (after play() is called)
338
- avatarView.avatarController.sendAudioChunk(
339
- data: Uint8Array, // Audio chunk data
340
- isLast: boolean = false // Whether this is the last chunk
341
- )
670
+ **⚠️ Important: Conversation ID (conversationId) Management**
342
671
 
343
- // Stream additional animation keyframes (after play() is called)
344
- avatarView.avatarController.sendKeyframes(
345
- keyframes: any[] // Additional animation keyframes (obtained from your service)
346
- )
347
- ```
672
+ **SDK Mode:**
673
+ - `send()` returns a conversationId to distinguish each conversation round
674
+ - `end=true` marks the end of a conversation round
675
+
676
+ **Host Mode:**
677
+ - `yieldAudioData()` returns a conversationId (automatically generates if starting new session)
678
+ - `yieldFramesData()` requires a valid conversationId parameter
679
+ - Animation data with mismatched conversationId will be **discarded**
680
+ - Use `getCurrentConversationId()` to retrieve the current active conversationId
348
681
 
349
682
  #### Common Methods (Both Modes)
350
683
 
351
684
  ```typescript
685
+
686
+ // Pause playback (from playing state)
687
+ avatarView.controller.pause()
688
+
689
+ // Resume playback (from paused state)
690
+ await avatarView.controller.resume()
691
+
352
692
  // Interrupt current playback (stops and clears data)
353
- avatarView.avatarController.interrupt()
693
+ avatarView.controller.interrupt()
354
694
 
355
695
  // Clear all data and resources
356
- avatarView.avatarController.clear()
696
+ avatarView.controller.clear()
357
697
 
358
- // Get connection state (network mode only)
359
- const isConnected = avatarView.avatarController.connected
698
+ // Get current conversation ID (for Host mode)
699
+ const conversationId = avatarView.controller.getCurrentConversationId()
700
+ // Returns: Current conversationId for the active audio session, or null if no active session
360
701
 
361
- // Start service (network mode only)
362
- await avatarView.avatarController.start()
702
+ // Volume control (affects only avatar audio player, not system volume)
703
+ avatarView.controller.setVolume(0.5) // Set volume to 50% (0.0 to 1.0)
704
+ const currentVolume = avatarView.controller.getVolume() // Get current volume (0.0 to 1.0)
363
705
 
364
- // Close service (network mode only)
365
- avatarView.avatarController.close()
706
+ // Set event callbacks
707
+ avatarView.controller.onConnectionState = (state: ConnectionState) => {} // SDK mode only
708
+ avatarView.controller.onConversationState = (state: ConversationState) => {}
709
+ avatarView.controller.onError = (error: Error) => {} // Usually AvatarError (includes code for SDK/server errors)
710
+ ```
366
711
 
367
- // Get current avatar state
368
- const state = avatarView.avatarController.state
712
+ #### Avatar Transform Methods
369
713
 
370
- // Set event callbacks
371
- avatarView.avatarController.onConnectionState = (state: ConnectionState) => {} // Network mode only
372
- avatarView.avatarController.onAvatarState = (state: AvatarState) => {}
373
- avatarView.avatarController.onError = (error: Error) => {}
714
+ ```typescript
715
+ // Get or set avatar transform (position and scale in canvas)
716
+ // Get current transform
717
+ const currentTransform = avatarView.transform // { x: number, y: number, scale: number }
718
+
719
+ // Set transform
720
+ avatarView.transform = { x, y, scale }
721
+ // - x: Horizontal offset in normalized coordinates (-1 to 1, where -1 = left edge, 0 = center, 1 = right edge)
722
+ // - y: Vertical offset in normalized coordinates (-1 to 1, where -1 = bottom edge, 0 = center, 1 = top edge)
723
+ // - scale: Scale factor (1.0 = original size, 2.0 = double size, 0.5 = half size)
724
+ // Example:
725
+ avatarView.transform = { x: 0, y: 0, scale: 1.0 } // Center, original size
726
+ avatarView.transform = { x: 0.5, y: 0, scale: 2.0 } // Right half, double size
374
727
  ```
375
728
 
376
729
  **Important Notes:**
377
- - `start()` and `close()` are only available in network mode
378
- - `play()`, `sendAudioChunk()`, and `sendKeyframes()` are only available in external data mode
379
- - `interrupt()` and `clear()` are available in both modes
730
+ - `start()` and `close()` are only available in SDK mode
731
+ - `yieldAudioData()` and `yieldFramesData()` are only available in Host mode
732
+ - `pause()`, `resume()`, `interrupt()`, `clear()`, `getCurrentConversationId()`, `setVolume()`, and `getVolume()` are available in both modes
380
733
  - The playback mode is determined when creating `AvatarView` and cannot be changed
381
734
 
382
735
  ## 🔧 Configuration
@@ -386,40 +739,55 @@ avatarView.avatarController.onError = (error: Error) => {}
386
739
  ```typescript
387
740
  interface Configuration {
388
741
  environment: Environment
742
+ drivingServiceMode?: DrivingServiceMode // Optional, default is 'sdk' (SDK mode)
743
+ logLevel?: LogLevel // Optional, default is 'off' (no logs)
744
+ audioFormat?: AudioFormat // Optional, default is { channelCount: 1, sampleRate: 16000 }
745
+ characterApiBaseUrl?: string // Optional, internal debug config, can be ignored
389
746
  }
390
- ```
391
747
 
392
- **Description:**
393
- - `environment`: Specifies the environment (cn/us/test), SDK will automatically use the corresponding API address and WebSocket address based on the environment
394
- - `sessionToken`: Set separately via `AvatarKit.setSessionToken()`, not in Configuration
395
-
396
- ```typescript
397
- enum Environment {
398
- cn = 'cn', // China region
399
- us = 'us', // US region
400
- test = 'test' // Test environment
748
+ interface AudioFormat {
749
+ readonly channelCount: 1 // Fixed to 1 (mono)
750
+ readonly sampleRate: number // Supported: 8000, 16000, 22050, 24000, 32000, 44100, 48000 Hz, default: 16000
401
751
  }
402
752
  ```
403
753
 
404
- ### AvatarViewOptions
754
+ ### LogLevel
755
+
756
+ Control the verbosity of SDK logs:
405
757
 
406
758
  ```typescript
407
- interface AvatarViewOptions {
408
- playbackMode?: AvatarPlaybackMode // Playback mode, default is 'network'
409
- container?: HTMLElement // Canvas container element
759
+ enum LogLevel {
760
+ off = 'off', // Disable all logs
761
+ error = 'error', // Only error logs
762
+ warning = 'warning', // Warning and error logs
763
+ all = 'all' // All logs (info, warning, error) - default
410
764
  }
411
765
  ```
412
766
 
767
+ **Note:** `LogLevel.off` completely disables all logging, including error logs. Use with caution in production environments.
768
+
413
769
  **Description:**
414
- - `playbackMode`: Specifies the playback mode (`'network'` or `'external'`), default is `'network'`
415
- - `'network'`: SDK handles WebSocket communication, send audio via `send()`
416
- - `'external'`: External components provide audio and animation data, SDK handles synchronized playback
417
- - `container`: Optional container element for Canvas, if not provided, Canvas will be created but not added to DOM
770
+ - `environment`: Specifies the environment (cn/intl), SDK will automatically use the corresponding server addresses based on the environment
771
+ - `drivingServiceMode`: Specifies the driving service mode
772
+ - `DrivingServiceMode.sdk` (default): SDK mode - SDK handles network communication automatically
773
+ - `DrivingServiceMode.host`: Host mode - Host application provides audio and animation data
774
+ - `logLevel`: Controls the verbosity of SDK logs
775
+ - `LogLevel.off` (default): Disable all logs
776
+ - `LogLevel.error`: Only error logs
777
+ - `LogLevel.warning`: Warning and error logs
778
+ - `LogLevel.all`: All logs (info, warning, error)
779
+ - `audioFormat`: Configures audio sample rate and channel count
780
+ - `channelCount`: Fixed to 1 (mono channel)
781
+ - `sampleRate`: Audio sample rate in Hz (default: 16000)
782
+ - Supported values: 8000, 16000, 22050, 24000, 32000, 44100, 48000
783
+ - The configured sample rate will be used for both audio recording and playback
784
+ - `characterApiBaseUrl`: Internal debug config, can be ignored
785
+ - `sessionToken`: **Required for authentication**. Set separately via `AvatarSDK.setSessionToken()`, not in Configuration. See [Authentication](#-authentication) section for details
418
786
 
419
787
  ```typescript
420
- enum AvatarPlaybackMode {
421
- network = 'network', // Network mode: SDK handles WebSocket communication
422
- external = 'external' // External data mode: External provides data, SDK handles playback
788
+ enum Environment {
789
+ cn = 'cn', // China region
790
+ intl = 'intl', // International region
423
791
  }
424
792
  ```
425
793
 
@@ -450,89 +818,42 @@ enum ConnectionState {
450
818
  }
451
819
  ```
452
820
 
453
- ### AvatarState
821
+ ### ConversationState
454
822
 
455
823
  ```typescript
456
- enum AvatarState {
457
- idle = 'idle', // Idle state, showing breathing animation
458
- active = 'active', // Active, waiting for playable content
459
- playing = 'playing' // Playing
824
+ enum ConversationState {
825
+ idle = 'idle', // Idle state (breathing animation)
826
+ playing = 'playing', // Playing state (active conversation)
827
+ pausing = 'pausing' // Pausing state (paused during playback)
460
828
  }
461
829
  ```
462
830
 
463
- ## 🎨 Rendering System
464
-
465
- The SDK supports two rendering backends:
466
-
467
- - **WebGPU** - High-performance rendering for modern browsers
468
- - **WebGL** - Better compatibility traditional rendering
469
-
470
- The rendering system automatically selects the best backend, no manual configuration needed.
471
-
472
- ## 🔍 Debugging and Monitoring
473
-
474
- ### Logging System
475
-
476
- The SDK has a built-in complete logging system, supporting different levels of log output:
477
-
478
- ```typescript
479
- import { logger } from '@spatialwalk/avatarkit'
480
-
481
- // Set log level
482
- logger.setLevel('verbose') // 'basic' | 'verbose'
483
-
484
- // Manual log output
485
- logger.log('Info message')
486
- logger.warn('Warning message')
487
- logger.error('Error message')
488
- ```
489
-
490
- ### Performance Monitoring
491
-
492
- The SDK provides performance monitoring interfaces to monitor rendering performance:
831
+ **State Description:**
832
+ - `idle`: Avatar is in idle state (breathing animation), waiting for conversation to start
833
+ - `playing`: Avatar is playing conversation content (including during transition animations)
834
+ - `pausing`: Avatar playback is paused (e.g., when `end=false` and waiting for more audio data)
493
835
 
494
- ```typescript
495
- // Get rendering performance statistics
496
- const stats = avatarView.getPerformanceStats()
497
-
498
- if (stats) {
499
- console.log(`Render time: ${stats.renderTime.toFixed(2)}ms`)
500
- console.log(`Sort time: ${stats.sortTime.toFixed(2)}ms`)
501
- console.log(`Rendering backend: ${stats.backend}`)
502
-
503
- // Calculate frame rate
504
- const fps = 1000 / stats.renderTime
505
- console.log(`Frame rate: ${fps.toFixed(2)} FPS`)
506
- }
836
+ **Note:** During transition animations, the target state is notified immediately:
837
+ - When transitioning from `idle` to `playing`, the `playing` state is notified immediately
838
+ - When transitioning from `playing` to `idle`, the `idle` state is notified immediately
507
839
 
508
- // Regular performance monitoring
509
- setInterval(() => {
510
- const stats = avatarView.getPerformanceStats()
511
- if (stats) {
512
- // Send to monitoring service or display on UI
513
- console.log('Performance:', stats)
514
- }
515
- }, 1000)
516
- ```
840
+ ## 🎨 Rendering System
517
841
 
518
- **Performance Statistics Description:**
519
- - `renderTime`: Total rendering time (milliseconds), includes sorting and GPU rendering
520
- - `sortTime`: Sorting time (milliseconds), uses Radix Sort algorithm to depth-sort point cloud
521
- - `backend`: Currently used rendering backend (`'webgpu'` | `'webgl'` | `null`)
842
+ The SDK automatically selects the best rendering backend for your browser, no manual configuration needed.
522
843
 
523
844
  ## 🚨 Error Handling
524
845
 
525
- ### SPAvatarError
846
+ ### AvatarError
526
847
 
527
848
  The SDK uses custom error types, providing more detailed error information:
528
849
 
529
850
  ```typescript
530
- import { SPAvatarError } from '@spatialwalk/avatarkit'
851
+ import { AvatarError } from '@spatialwalk/avatarkit'
531
852
 
532
853
  try {
533
- await avatarView.avatarController.start()
854
+ await avatarView.controller.start()
534
855
  } catch (error) {
535
- if (error instanceof SPAvatarError) {
856
+ if (error instanceof AvatarError) {
536
857
  console.error('SDK Error:', error.message, error.code)
537
858
  } else {
538
859
  console.error('Unknown error:', error)
@@ -543,113 +864,73 @@ try {
543
864
  ### Error Callbacks
544
865
 
545
866
  ```typescript
546
- avatarView.avatarController.onError = (error: Error) => {
547
- console.error('AvatarController error:', error)
548
- // Handle error, such as reconnection, user notification, etc.
867
+ import { AvatarError } from '@spatialwalk/avatarkit'
868
+
869
+ avatarView.controller.onError = (error: Error) => {
870
+ if (error instanceof AvatarError) {
871
+ console.error('AvatarController error:', error.message, error.code)
872
+ return
873
+ }
874
+
875
+ console.error('AvatarController unknown error:', error)
549
876
  }
550
877
  ```
551
878
 
879
+ In SDK mode, server `MESSAGE_SERVER_ERROR` is forwarded to `onError` as `AvatarError`:
880
+ - `error.message`: server-returned error message
881
+ - `error.code` mapping:
882
+ - `401` -> `sessionTokenExpired`
883
+ - `400` -> `sessionTokenInvalid`
884
+ - `404` -> `avatarIDUnrecognized`
885
+ - other HTTP status -> original status code string (for example, `"500"`)
886
+
552
887
  ## 🔄 Resource Management
553
888
 
554
889
  ### Lifecycle Management
555
890
 
556
- #### Network Mode Lifecycle
891
+ #### SDK Mode Lifecycle
557
892
 
558
893
  ```typescript
559
894
  // Initialize
560
895
  const container = document.getElementById('avatar-container')
561
- const avatarView = new AvatarView(avatar, {
562
- container: container,
563
- playbackMode: AvatarPlaybackMode.network
564
- })
565
- await avatarView.avatarController.start()
896
+ const avatarView = new AvatarView(avatar, container)
897
+ await avatarView.controller.start()
566
898
 
567
899
  // Use
568
- avatarView.avatarController.send(audioData, false)
900
+ avatarView.controller.send(audioData, false)
569
901
 
570
- // Cleanup
571
- avatarView.avatarController.close()
572
- avatarView.dispose() // Automatically cleans up all resources
902
+ // Cleanup - dispose() automatically cleans up all resources including connections
903
+ avatarView.dispose()
573
904
  ```
574
905
 
575
- #### External Data Mode Lifecycle
906
+ #### Host Mode Lifecycle
576
907
 
577
908
  ```typescript
578
909
  // Initialize
579
910
  const container = document.getElementById('avatar-container')
580
- const avatarView = new AvatarView(avatar, {
581
- container: container,
582
- playbackMode: AvatarPlaybackMode.external
583
- })
911
+ const avatarView = new AvatarView(avatar, container)
584
912
 
585
913
  // Use
586
- const initialAudioChunks = [{ data: audioData1, isLast: false }]
587
- await avatarView.avatarController.play(initialAudioChunks, initialKeyframes)
588
- avatarView.avatarController.sendAudioChunk(audioChunk, false)
589
- avatarView.avatarController.sendKeyframes(keyframes)
914
+ const conversationId = avatarView.controller.yieldAudioData(audioChunk, false)
915
+ avatarView.controller.yieldFramesData(keyframesDataArray, conversationId)
590
916
 
591
- // Cleanup
592
- avatarView.avatarController.clear() // Clear all data and resources
593
- avatarView.dispose() // Automatically cleans up all resources
917
+ // Cleanup - dispose() automatically cleans up all resources including playback data
918
+ avatarView.dispose()
594
919
  ```
595
920
 
596
- **⚠️ Important Notes:**
597
- - SDK currently only supports one AvatarView instance at a time
598
- - When switching characters, must first call `dispose()` to clean up old AvatarView, then create new instance
599
- - Not properly cleaning up may cause resource leaks and rendering errors
600
- - In network mode, call `close()` before `dispose()` to properly close WebSocket connections
601
- - In external data mode, call `clear()` before `dispose()` to clear all playback data
921
+ **⚠️ Important Notes:**
922
+ - `dispose()` automatically cleans up all resources, including:
923
+ - Network connections (SDK mode)
924
+ - Playback data and animation resources (both modes)
925
+ - Render system and canvas elements
926
+ - All event listeners and callbacks
927
+ - Not properly calling `dispose()` may cause resource leaks and rendering errors
928
+ - If you need to manually close connections or clear playback data before disposing, you can call `avatarView.controller.close()` (SDK mode) or `avatarView.controller.clear()` (both modes) first, but it's not required as `dispose()` handles this automatically
602
929
 
603
930
  ### Memory Optimization
604
931
 
605
- - SDK automatically manages WASM memory allocation
606
- - Supports dynamic loading/unloading of character and animation resources
607
- - Provides memory usage monitoring interface
608
-
609
- ### Audio Data Sending
610
-
611
- #### Network Mode
612
-
613
- The `send()` method receives audio data in `ArrayBuffer` format:
614
-
615
- **Audio Format Requirements:**
616
- - **Sample Rate**: 16kHz (16000 Hz) - **Backend requirement, must be exactly 16kHz**
617
- - **Format**: PCM16 (16-bit signed integer, little-endian)
618
- - **Channels**: Mono (single channel)
619
- - **Data Size**: Each sample is 2 bytes, so 1 second of audio = 16000 samples × 2 bytes = 32000 bytes
620
-
621
- **Usage:**
622
- - `audioData`: Audio data (ArrayBuffer format, must be 16kHz mono PCM16)
623
- - `end=false` (default) - Normal audio data sending, server will accumulate audio data, automatically returns animation data and starts synchronized playback of animation and audio after accumulating enough data
624
- - `end=true` - Immediately return animation data, no longer accumulating, used for ending current conversation or scenarios requiring immediate response
625
- - **Important**: No need to wait for `end=true` to start playing, it will automatically start playing after accumulating enough audio data
626
-
627
- #### External Data Mode
628
-
629
- The `play()` method starts playback with initial data, then use `sendAudioChunk()` to stream additional audio:
630
-
631
- **Audio Format Requirements:**
632
- - Same as network mode: 16kHz mono PCM16 format
633
- - Audio data should be provided as `Uint8Array` in chunks with `isLast` flag
634
-
635
- **Usage:**
636
- ```typescript
637
- // Start playback with initial audio and animation data
638
- // Note: Audio and animation data should be obtained from your backend service
639
- const initialAudioChunks = [
640
- { data: audioData1, isLast: false },
641
- { data: audioData2, isLast: false }
642
- ]
643
- await avatarController.play(initialAudioChunks, initialKeyframes)
644
-
645
- // Stream additional audio chunks
646
- avatarController.sendAudioChunk(audioChunk, isLast)
647
- ```
648
-
649
- **Resampling (Both Modes):**
650
- - If your audio source is at a different sample rate (e.g., 24kHz, 48kHz), you **must** resample it to 16kHz before sending
651
- - For high-quality resampling, use Web Audio API's `OfflineAudioContext` with anti-aliasing filtering
652
- - See example projects (`vanilla`, `react`, `vue`) for complete resampling implementation
932
+ - SDK automatically manages memory allocation
933
+ - Supports dynamic loading/unloading of avatar and animation resources
653
934
 
654
935
  ## 🌐 Browser Compatibility
655
936
 
@@ -669,6 +950,5 @@ Issues and Pull Requests are welcome!
669
950
  ## 📞 Support
670
951
 
671
952
  For questions, please contact:
672
- - Email: support@spavatar.com
673
- - Documentation: https://docs.spavatar.com
674
- - GitHub: https://github.com/spavatar/sdk
953
+ - Email: code@spatialwalk.net
954
+ - Documentation: https://docs.spatialreal.ai