npm - @omote/core - Versions diffs - 0.5.3 → 0.5.5 - Mend

@omote/core 0.5.3 → 0.5.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md CHANGED Viewed

@@ -1,19 +1,21 @@
 # @omote/core
-> WebGPU-accelerated inference for real-time lip sync, speech recognition, emotion detection, and avatar animation - runs entirely in browser.
+> Client-side AI inference for real-time lip sync, speech recognition, and avatar animation — runs entirely in browser via WebGPU and WASM.
 ## Features
-- **Lip Sync** - LAM (Wav2Vec2) inference (audio → 52 ARKit blendshapes directly)
-- **Speech-to-Text** - Whisper ASR (tiny/base models)
-- **Emotion Detection** - DistilHuBERT speech emotion recognition (7 emotions)
-- **Voice Activity Detection** - Silero VAD for speech detection
-- **Emotion Control** - 10-channel emotion system with presets and transitions
-- **Model Caching** - IndexedDB-based caching for fast subsequent loads
-- **Structured Logging** - 6 log levels, JSON/pretty formats, module-scoped
-- **Telemetry** - OpenTelemetry-compatible tracing and metrics
-- **Offline Ready** - Works entirely without internet
-- **WebGPU + WASM** - Auto-fallback for broad browser support
+- **Lip Sync (A2E)** — Audio to 52 ARKit blendshapes via Wav2Vec2, with automatic GPU/CPU platform detection
+- **Full-Face Pipeline** — TTS audio playback to lip sync with ExpressionProfile scaling, gapless scheduling
+- **Speech Recognition** — SenseVoice ASR (ONNX), 15x faster than Whisper, progressive transcription
+- **Voice Activity Detection** — Silero VAD with Worker and main-thread modes
+- **Text-to-Speech** — ChatterboxTurbo (experimental, use server-side TTS for production)
+- **Animation Graph** — State machine (idle/listening/thinking/speaking) with emotion blending
+- **Emotion Controller** — Preset-based emotion system with smooth transitions
+- **Model Caching** — IndexedDB with versioning, LRU eviction, and quota monitoring
+- **Microphone Capture** — Browser noise suppression, echo cancellation, AGC
+- **Logging & Telemetry** — Structured logging (6 levels) and OpenTelemetry-compatible tracing
+- **Offline Ready** — No cloud dependencies, works entirely without internet
+- **WebGPU + WASM** — WebGPU-first with automatic WASM fallback
 ## Installation
@@ -21,563 +23,340 @@
 npm install @omote/core
 ```
+Peer dependency: `onnxruntime-web` is included — no additional installs needed.
 ## Quick Start
-### Lip Sync (Wav2Vec2Inference)
+### FullFacePipeline (TTS Lip Sync)
+The most common use case: feed TTS audio chunks and get back 52 ARKit blendshape frames at render rate.
 ```typescript
-import { Wav2Vec2Inference, LAM_BLENDSHAPES } from '@omote/core';
+import { FullFacePipeline, createA2E } from '@omote/core';
-const lam = new Wav2Vec2Inference({
-  modelUrl: '/models/lam-wav2vec2.onnx',
-  backend: 'auto', // 'webgpu' | 'wasm' | 'auto'
+// 1. Create A2E backend (auto-detects GPU vs CPU)
+const lam = createA2E({
+  gpuModelUrl: '/models/lam-wav2vec2.onnx',
+  cpuModelUrl: '/models/wav2arkit_cpu.onnx',
+  mode: 'auto',
 });
 await lam.load();
-// Process audio (16kHz Float32Array)
-const result = await lam.infer(audioSamples);
-// result.blendshapes is an array of frames, each with 52 ARKit weights
-for (const frame of result.blendshapes) {
-  const jawOpen = frame[LAM_BLENDSHAPES.indexOf('jawOpen')];
-  applyToAvatar(frame);
-}
-```
-### Speech-to-Text (WhisperInference)
-```typescript
-import { WhisperInference } from '@omote/core';
+// 2. Create pipeline with expression profile
+const pipeline = new FullFacePipeline({
+  lam,
+  sampleRate: 16000,
+  profile: { mouth: 1.0, jaw: 1.0, brows: 0.6, eyes: 0.0, cheeks: 0.5, nose: 0.3, tongue: 0.5 },
+});
+await pipeline.initialize();
-// Models auto-download from HuggingFace
-const whisper = new WhisperInference({ model: 'tiny' });
-await whisper.load();
+// 3. Listen for blendshape frames
+pipeline.on('full_frame_ready', (frame) => {
+  applyToAvatar(frame.blendshapes); // ExpressionProfile-scaled, 52 ARKit weights
+});
-const { text, inferenceTimeMs } = await whisper.transcribe(audioSamples);
-console.log(text); // "Hello world"
+// 4. Feed TTS audio and play
+pipeline.start();
+await pipeline.onAudioChunk(ttsAudioChunk); // Uint8Array PCM16
+await pipeline.end(); // Flush remaining audio
 ```
-### Emotion Detection (DistilHuBERTEmotionInference)
-```typescript
-import { DistilHuBERTEmotionInference, EMOTION_LABELS } from '@omote/core';
+## API Reference
-const emotion = new DistilHuBERTEmotionInference({
-  modelUrl: '/models/distilhubert-emotion.onnx'
-});
-await emotion.load();
+### A2E (Audio to Expression)
-const { emotion: detected, probabilities } = await emotion.infer(audioSamples);
-console.log(detected); // 'happy', 'sad', 'angry', etc.
-```
+#### Factory API (Recommended)
-### Voice Activity Detection (SileroVADInference)
+Auto-detects platform: Chrome/Edge/Android use WebGPU, Safari/iOS use WASM CPU fallback.
 ```typescript
-import { SileroVADInference } from '@omote/core';
+import { createA2E } from '@omote/core';
-const vad = new SileroVADInference({
-  modelUrl: '/models/silero-vad.onnx'
+const a2e = createA2E({
+  gpuModelUrl: '/models/lam-wav2vec2.onnx',       // 384MB, WebGPU
+  cpuModelUrl: '/models/wav2arkit_cpu.onnx',        // 404MB, WASM
+  mode: 'auto',            // 'auto' | 'gpu' | 'cpu'
+  fallbackOnError: true,   // GPU failure → auto-switch to CPU
 });
-await vad.load();
+await a2e.load();
-const { isSpeech, probability } = await vad.infer(audioSamples);
+const { blendshapes } = await a2e.infer(audioSamples); // Float32Array (16kHz)
+// → 52 ARKit blendshape weights
 ```
-### Emotion Control
+#### Direct API
 ```typescript
-import {
-  EmotionController,
-  createEmotionVector,
-  EmotionPresets,
-} from '@omote/core';
-// Create emotion vectors
-const emotion = createEmotionVector({ joy: 0.8, amazement: 0.2 });
+import { Wav2Vec2Inference, LAM_BLENDSHAPES } from '@omote/core';
-// Or use controller for smooth transitions
-const controller = new EmotionController();
-controller.setPreset('happy');
-controller.transitionTo({ sadness: 0.7 }, 500); // 500ms transition
+const lam = new Wav2Vec2Inference({ modelUrl: '/models/lam-wav2vec2.onnx' });
+await lam.load();
-// In animation loop
-controller.update();
-const currentEmotion = controller.emotion; // Float32Array(26)
+const { blendshapes } = await lam.infer(audioSamples);
+const jawOpen = blendshapes[LAM_BLENDSHAPES.indexOf('jawOpen')];
 ```
-**Available Emotions:** `amazement`, `anger`, `cheekiness`, `disgust`, `fear`, `grief`, `joy`, `outofbreath`, `pain`, `sadness`
-**Presets:** `neutral`, `happy`, `sad`, `angry`, `surprised`, `scared`, `disgusted`, `excited`, `tired`, `playful`, `pained`, `contemplative`
+### FullFacePipeline
-### Microphone Capture
+End-to-end TTS playback with lip sync inference, audio scheduling, and ExpressionProfile scaling.
 ```typescript
-import { MicrophoneCapture } from '@omote/core';
+import { FullFacePipeline } from '@omote/core';
-const mic = new MicrophoneCapture({
+const pipeline = new FullFacePipeline({
+  lam,                // A2E backend from createA2E()
   sampleRate: 16000,
-  bufferSize: 4096,
+  profile: { mouth: 1.0, jaw: 1.0, brows: 0.6, eyes: 0.0, cheeks: 0.5, nose: 0.3, tongue: 0.5 },
 });
+await pipeline.initialize();
-mic.on('audio', ({ samples }) => {
-  // Process audio samples
-  const result = await lam.infer(samples);
+pipeline.on('full_frame_ready', (frame) => {
+  // frame.blendshapes    — ExpressionProfile-scaled
+  // frame.rawBlendshapes — unscaled original values
+  applyToAvatar(frame.blendshapes);
 });
-await mic.start();
+pipeline.start();
+await pipeline.onAudioChunk(chunk); // feed TTS audio (Uint8Array PCM16)
+await pipeline.end();               // flush final partial chunk
 ```
-### Logging
+### A2EProcessor
-```typescript
-import { configureLogging, createLogger } from '@omote/core';
+Engine-agnostic audio-to-blendshapes processor for custom integrations. Supports pull mode (timestamped frames for TTS) and push mode (drip-feed for live mic).
-// Configure globally (once at app startup)
-configureLogging({
-  level: 'debug',      // 'error' | 'warn' | 'info' | 'debug' | 'trace' | 'verbose'
-  format: 'pretty',    // 'json' | 'pretty'
-  enabled: true,
-});
-// Create module-specific loggers
-const logger = createLogger('MyComponent');
-logger.info('Model loaded', { backend: 'webgpu', loadTimeMs: 1234 });
-logger.debug('Processing audio', { samples: 16000 });
-logger.error('Failed to load', { error: err.message });
+```typescript
+import { A2EProcessor } from '@omote/core';
-// JSON output (production):
-// {"timestamp":1704672000000,"level":"info","module":"MyComponent","message":"Model loaded","data":{"backend":"webgpu"}}
+const processor = new A2EProcessor({ backend: lam, chunkSize: 16000 });
-// Pretty output (development):
-// [12:00:00.000] INFO  [MyComponent] Model loaded { backend: 'webgpu' }
+// Pull mode: timestamp audio for later retrieval
+processor.pushAudio(samples, audioContext.currentTime + delay);
+const frame = processor.getFrameForTime(audioContext.currentTime);
 ```
-### Telemetry
+### Speech Recognition (SenseVoice)
-OpenTelemetry-compatible observability for inference operations.
+SenseVoice ASR — 15x faster than Whisper, with progressive transcription and emotion detection.
 ```typescript
-import { configureTelemetry, getTelemetry } from '@omote/core';
+import { SenseVoiceInference } from '@omote/core';
-// Development: Console output
-configureTelemetry({
-  enabled: true,
-  serviceName: 'my-app',
-  exporter: 'console',
+const asr = new SenseVoiceInference({
+  modelUrl: '/models/sensevoice/model.int8.onnx',
 });
+await asr.load();
-// Production: OTLP export to Jaeger/Tempo/etc
-configureTelemetry({
-  enabled: true,
-  serviceName: 'my-app',
-  serviceVersion: '1.0.0',
-  exporter: 'otlp',
-  exporterConfig: {
-    endpoint: 'https://tempo.example.com',
-    headers: { 'Authorization': 'Bearer token' },
-  },
-  sampling: {
-    ratio: 0.1,              // Sample 10% of traces
-    alwaysSampleErrors: true, // Always capture errors
-  },
-});
+const { text, emotion, language } = await asr.transcribe(audioSamples);
+```
-// Manual instrumentation
-const telemetry = getTelemetry();
+#### Platform-Aware ASR
-// Create spans for custom operations
-const span = telemetry.startSpan('custom-operation', {
-  'custom.attribute': 'value',
-});
-try {
-  // ... do work
-  span.setStatus('ok');
-} catch (error) {
-  span.setStatus('error', error);
-} finally {
-  span.end();
-}
-// Record metrics
-telemetry.recordMetric('custom_counter', 1, 'counter', { label: 'value' });
-telemetry.recordMetric('custom_gauge', 42.5, 'gauge');
+```typescript
+import { shouldUseNativeASR, SafariSpeechRecognition, SenseVoiceInference } from '@omote/core';
+const asr = shouldUseNativeASR()
+  ? new SafariSpeechRecognition({ language: 'en-US' })
+  : new SenseVoiceInference({ modelUrl: '/models/sensevoice/model.int8.onnx' });
 ```
-### Model Caching
+### Voice Activity Detection (Silero VAD)
-Automatic IndexedDB caching for ONNX models.
+#### Factory API (Recommended)
 ```typescript
-import { getModelCache, fetchWithCache, preloadModels, formatBytes } from '@omote/core';
-// Fetch with automatic caching (used internally by inference classes)
-const modelData = await fetchWithCache('/models/lam-wav2vec2.onnx', (loaded, total) => {
-  console.log(`Loading: ${formatBytes(loaded)} / ${formatBytes(total)}`);
-});
+import { createSileroVAD } from '@omote/core';
-// Preload multiple models
-await preloadModels([
-  '/models/lam-wav2vec2.onnx',
-  '/models/silero-vad.onnx',
-], (completed, total, url) => {
-  console.log(`Preloaded ${completed}/${total}: ${url}`);
+const vad = createSileroVAD({
+  modelUrl: '/models/silero-vad.onnx',
+  threshold: 0.5,
+  // useWorker: true   // Force off-main-thread
+  // useWorker: false  // Force main thread
 });
+await vad.load();
-// Manual cache management
-const cache = getModelCache();
-const stats = await cache.getStats();
-console.log(`Cached: ${stats.modelCount} models, ${formatBytes(stats.totalSize)}`);
-await cache.delete('/models/old-model.onnx');
-await cache.clear(); // Clear all cached models
+const { isSpeech, probability } = await vad.process(audioSamples);
 ```
-## Models
+#### Direct API
-### Required Files
+```typescript
+import { SileroVADInference, SileroVADWorker } from '@omote/core';
-Place models in your public assets folder:
+// Main thread (mobile-friendly)
+const vad = new SileroVADInference({ modelUrl: '/models/silero-vad.onnx' });
-```
-public/
-  models/
-    lam-wav2vec2.onnx         # LAM lip sync model
-    silero-vad.onnx           # Voice activity detection
-    distilhubert-emotion.onnx # Emotion detection
+// Web Worker (desktop, off-main-thread)
+const vadWorker = new SileroVADWorker({ modelUrl: '/models/silero-vad.onnx' });
 ```
-### Whisper Models (Auto-Download)
+### Animation Graph
-Whisper models download automatically from HuggingFace on first use:
-| Model | Size | Quality | Speed |
-|-------|------|---------|-------|
-| `tiny` | ~75MB | Good | Fastest |
-| `base` | ~150MB | Better | Medium |
+State machine for avatar animation states with emotion blending and audio energy.
 ```typescript
-const whisper = new WhisperInference({ model: 'tiny' }); // Recommended for real-time
-```
+import { AnimationGraph, AudioEnergyAnalyzer, EmphasisDetector } from '@omote/core';
-## API Reference
+const graph = new AnimationGraph();
-### Wav2Vec2Inference (LAM)
+graph.on('state.change', ({ from, to, trigger }) => {
+  console.log(`${from} → ${to}`);
+});
-LAM lip sync - audio to 52 ARKit blendshapes directly.
+graph.on('output.update', (output) => applyToAvatar(output));
-| Method | Description |
-|--------|-------------|
-| `new Wav2Vec2Inference(config)` | Create with `{ modelUrl, backend? }` |
-| `load()` | Load ONNX model |
-| `infer(audio)` | Run inference |
-| `dispose()` | Release resources |
+// State transitions
+graph.trigger('user_speech_start');  // idle → listening
+graph.trigger('transcript_ready');   // listening → thinking
+graph.trigger('ai_audio_start');     // thinking → speaking
+graph.trigger('ai_audio_end');       // speaking → idle
-```typescript
-interface Wav2Vec2Result {
-  blendshapes: Float32Array[];  // Array of frames, each 52 ARKit weights
-  inferenceTimeMs: number;
-  backend: 'webgpu' | 'wasm';
-}
+// Blend emotion and audio energy into output
+graph.setEmotion('happy', 0.8);
+graph.setAudioEnergy(0.7);
+graph.update(deltaTime); // call each frame
 ```
-### WhisperInference
-Whisper speech-to-text.
+**States:** `idle` → `listening` → `thinking` → `speaking` → `idle`
-| Method | Description |
-|--------|-------------|
-| `new WhisperInference(config)` | Create with `{ model, modelUrl? }` |
-| `load()` | Load encoder + decoder |
-| `transcribe(audio)` | Transcribe audio |
-| `dispose()` | Release resources |
+### Emotion Controller
-### DistilHuBERTEmotionInference
-Speech emotion recognition (7 emotions).
-| Method | Description |
-|--------|-------------|
-| `new DistilHuBERTEmotionInference(config)` | Create with `{ modelUrl }` |
-| `load()` | Load ONNX model |
-| `infer(audio)` | Detect emotion |
-| `dispose()` | Release resources |
-**Emotion Labels:** `angry`, `disgusted`, `fearful`, `happy`, `neutral`, `sad`, `surprised`
-### SileroVADInference
-Voice activity detection.
-| Method | Description |
-|--------|-------------|
-| `new SileroVADInference(config)` | Create with `{ modelUrl }` |
-| `load()` | Load ONNX model |
-| `infer(audio)` | Detect speech |
-| `dispose()` | Release resources |
-### EmotionController
-Emotion state with smooth transitions.
-| Method | Description |
-|--------|-------------|
-| `set(weights)` | Set emotion immediately |
-| `setPreset(name)` | Set preset immediately |
-| `transitionTo(weights, ms)` | Smooth transition |
-| `transitionToPreset(name, ms)` | Transition to preset |
-| `update()` | Update transition (call each frame) |
-| `reset()` | Reset to neutral |
-| Property | Type | Description |
-|----------|------|-------------|
-| `emotion` | `Float32Array` | Current 26-element vector |
-| `isTransitioning` | `boolean` | Transition in progress |
-### Logger
-Structured logging with multiple output formats.
-| Function | Description |
-|----------|-------------|
-| `configureLogging(config)` | Set global logging configuration |
-| `createLogger(module)` | Create a module-specific logger |
-| `getGlobalLogger()` | Get the global logger instance |
-**Logger Methods:**
-| Method | Description |
-|--------|-------------|
-| `error(message, data?)` | Log error (always shown) |
-| `warn(message, data?)` | Log warning |
-| `info(message, data?)` | Log info |
-| `debug(message, data?)` | Log debug |
-| `trace(message, data?)` | Log trace |
-| `verbose(message, data?)` | Log verbose (most detailed) |
-| `child(subModule)` | Create child logger with prefixed module |
+```typescript
+import { EmotionController, EmotionPresets } from '@omote/core';
-**Configuration:**
+const controller = new EmotionController();
+controller.setPreset('happy');
+controller.transitionTo({ joy: 0.8 }, 500); // 500ms smooth transition
-```typescript
-interface LoggerConfig {
-  level?: 'error' | 'warn' | 'info' | 'debug' | 'trace' | 'verbose';
-  enabled?: boolean;
-  format?: 'json' | 'pretty';
-  sink?: (entry: LogEntry) => void;  // Custom output handler
-}
+// In animation loop
+controller.update();
+const current = controller.emotion;
 ```
-### OmoteTelemetry
+**Presets:** `neutral`, `happy`, `sad`, `angry`, `surprised`, `scared`, `disgusted`, `excited`, `tired`, `playful`, `pained`, `contemplative`
-OpenTelemetry-compatible telemetry for tracing and metrics.
+### Model Caching
-| Function | Description |
-|----------|-------------|
-| `configureTelemetry(config)` | Initialize telemetry system |
-| `getTelemetry()` | Get global telemetry instance |
+IndexedDB-based caching with versioning, LRU eviction, and storage quota monitoring.
-**OmoteTelemetry Methods:**
+```typescript
+import { getModelCache, fetchWithCache, preloadModels } from '@omote/core';
-| Method | Description |
-|--------|-------------|
-| `startSpan(name, attributes?)` | Start a new trace span |
-| `recordMetric(name, value, type, attributes?)` | Record a metric |
-| `flush()` | Force flush all pending data |
-| `shutdown()` | Shutdown telemetry system |
+// Fetch with automatic caching
+const data = await fetchWithCache('/models/model.onnx');
-**Span Methods:**
+// Versioned caching for model updates
+const data = await fetchWithCache('/models/model.onnx', {
+  version: '1.0.0',
+  validateStale: true,
+});
-| Method | Description |
-|--------|-------------|
-| `setAttribute(key, value)` | Add attribute to span |
-| `setStatus(status, error?)` | Set span status ('ok' or 'error') |
-| `end()` | End the span |
+// Cache quota monitoring
+import { configureCacheLimit, getQuotaInfo } from '@omote/core';
-**Configuration:**
+configureCacheLimit({
+  maxSizeBytes: 500 * 1024 * 1024, // 500MB limit
+  onQuotaWarning: (info) => console.warn(`Storage ${info.percentUsed}% used`),
+});
-```typescript
-interface TelemetryConfig {
-  enabled: boolean;
-  serviceName: string;
-  serviceVersion?: string;
-  exporter: 'console' | 'otlp' | 'none';
-  exporterConfig?: {
-    endpoint: string;
-    headers?: Record<string, string>;
-    timeoutMs?: number;
-  };
-  sampling?: {
-    ratio?: number;           // 0.0 to 1.0
-    alwaysSampleErrors?: boolean;
-  };
-}
+// Cache stats
+const cache = getModelCache();
+const stats = await cache.getStats(); // { totalSize, modelCount, models }
 ```
-### ModelCache
+### Microphone Capture
-IndexedDB-based model caching.
+```typescript
+import { MicrophoneCapture } from '@omote/core';
-| Function | Description |
-|----------|-------------|
-| `getModelCache()` | Get singleton cache instance |
-| `fetchWithCache(url, onProgress?)` | Fetch with automatic caching |
-| `preloadModels(urls, onProgress?)` | Preload multiple models |
-| `formatBytes(bytes)` | Format bytes as human-readable |
+const mic = new MicrophoneCapture({
+  sampleRate: 16000,
+  bufferSize: 4096,
+});
-**ModelCache Methods:**
+mic.on('audio', ({ samples }) => {
+  // Process 16kHz Float32Array samples
+});
-| Method | Description |
-|--------|-------------|
-| `has(url)` | Check if model is cached |
-| `get(url)` | Get cached model data |
-| `set(url, data, etag?)` | Store model in cache |
-| `delete(url)` | Remove model from cache |
-| `clear()` | Clear all cached models |
-| `getStats()` | Get cache statistics |
+await mic.start();
+```
-### Utility Functions
+### Logging
 ```typescript
-// Create emotion vector from named weights
-createEmotionVector({ joy: 0.8, amazement: 0.2 }): Float32Array
-// Blend multiple emotions
-blendEmotions([
-  { vector: preset1, weight: 0.7 },
-  { vector: preset2, weight: 0.3 },
-]): Float32Array
+import { configureLogging, createLogger } from '@omote/core';
-// Linear interpolation
-lerpEmotion(from, to, t): Float32Array
+configureLogging({ level: 'debug', format: 'pretty' });
-// Get preset copy
-getEmotionPreset('happy'): Float32Array
+const logger = createLogger('MyModule');
+logger.info('Model loaded', { backend: 'webgpu', loadTimeMs: 1234 });
 ```
-## ARKit Blendshapes
-52 output blendshapes compatible with ARKit:
-```
-eyeBlinkLeft, eyeLookDownLeft, eyeLookInLeft, eyeLookOutLeft, eyeLookUpLeft,
-eyeSquintLeft, eyeWideLeft, eyeBlinkRight, eyeLookDownRight, eyeLookInRight,
-eyeLookOutRight, eyeLookUpRight, eyeSquintRight, eyeWideRight,
-jawForward, jawLeft, jawRight, jawOpen,
-mouthClose, mouthFunnel, mouthPucker, mouthLeft, mouthRight,
-mouthSmileLeft, mouthSmileRight, mouthFrownLeft, mouthFrownRight,
-mouthDimpleLeft, mouthDimpleRight, mouthStretchLeft, mouthStretchRight,
-mouthRollLower, mouthRollUpper, mouthShrugLower, mouthShrugUpper,
-mouthPressLeft, mouthPressRight, mouthLowerDownLeft, mouthLowerDownRight,
-mouthUpperUpLeft, mouthUpperUpRight,
-browDownLeft, browDownRight, browInnerUp, browOuterUpLeft, browOuterUpRight,
-cheekPuff, cheekSquintLeft, cheekSquintRight,
-noseSneerLeft, noseSneerRight, tongueOut
-```
+### Telemetry
-## Technical Specifications
+OpenTelemetry-compatible tracing and metrics.
-### Audio Input
+```typescript
+import { configureTelemetry, getTelemetry } from '@omote/core';
-| Parameter | Value |
-|-----------|-------|
-| Sample Rate | 16kHz |
-| Format | Float32Array or Int16Array |
+configureTelemetry({
+  enabled: true,
+  serviceName: 'my-app',
+  exporter: 'console', // or 'otlp' for production
+});
-### Wav2Vec2 (LAM) Model
+const telemetry = getTelemetry();
+const span = telemetry.startSpan('custom-operation');
+// ... do work
+span.end();
+```
-| Parameter | Value |
-|-----------|-------|
-| Input | 16kHz audio samples |
-| Output | 52 ARKit blendshapes per frame |
-| Frame Rate | 30fps |
-| Backend | WebGPU / WASM |
+## Models
-### DistilHuBERT Emotion Labels
+Place models in your public assets directory:
 ```
-angry, disgusted, fearful, happy, neutral, sad, surprised
+public/models/
+  lam-wav2vec2.onnx              # A2E lip sync — WebGPU (384MB)
+  wav2arkit_cpu.onnx              # A2E lip sync — WASM fallback (1.86MB graph)
+  wav2arkit_cpu.onnx.data         # A2E lip sync — WASM fallback (402MB weights)
+  sensevoice/model.int8.onnx      # SenseVoice ASR (239MB)
+  silero-vad.onnx                 # Voice activity detection (~2MB)
 ```
-## AI Conversation (Platform Integration)
+## Browser Compatibility
-For production deployments with the Omote Platform, use the `AgentCoreAdapter` which handles:
+WebGPU-first with automatic WASM fallback.
-- WebSocket connection to AgentCore backend
-- Local Whisper ASR for speech-to-text
-- Receives TTS audio from backend (ElevenLabs handled server-side)
-- Local LAM inference for lip sync animation
+| Browser | WebGPU | WASM | Recommended |
+|---------|--------|------|-------------|
+| Chrome 113+ (Desktop) | Yes | Yes | WebGPU |
+| Chrome 113+ (Android) | Yes | Yes | WebGPU |
+| Edge 113+ | Yes | Yes | WebGPU |
+| Firefox 130+ | Flag only | Yes | WASM |
+| Safari 18+ (macOS) | Limited | Yes | WASM |
+| Safari (iOS) | No | Yes | WASM |
 ```typescript
-import { AgentCoreAdapter, ConversationOrchestrator } from '@omote/core';
-const orchestrator = new ConversationOrchestrator({
-  adapter: {
-    endpoint: 'wss://your-agentcore-endpoint.com/ws',
-    models: {
-      lamUrl: '/models/lam-wav2vec2.onnx',
-    },
-  },
-});
-// Register tenant
-orchestrator.registerTenant({
-  tenantId: 'tenant-123',
-  characterId: 'character-abc',
-  credentials: { authToken: 'jwt-token' },
-});
-// Create session
-const session = await orchestrator.createSession('tenant-123', {
-  systemPrompt: 'You are a helpful assistant.',
-});
-// Listen for animation events
-orchestrator.on('animation', ({ blendshapes }) => {
-  applyToAvatar(blendshapes);
-});
-// Push audio from microphone
-session.pushAudio(audioSamples);
+import { isWebGPUAvailable } from '@omote/core';
+const webgpu = await isWebGPUAvailable();
 ```
-## Browser Support
-| Browser | WebGPU | WASM Fallback |
-|---------|--------|---------------|
-| Chrome 113+ | Yes | Yes |
-| Edge 113+ | Yes | Yes |
-| Firefox | No | Yes |
-| Safari 18+ | Yes | Yes |
+## iOS Notes
-The SDK auto-detects WebGPU support and falls back to WASM when unavailable.
+All iOS browsers use WebKit under the hood. The SDK handles three platform constraints automatically:
-## iOS Support
+1. **WASM binary selection** — iOS crashes with the default JSEP/ASYNCIFY WASM binary. The SDK imports `onnxruntime-web/wasm` (non-JSEP) on iOS/Safari.
+2. **A2E model fallback** — The Wav2Vec2 GPU model exceeds iOS memory limits. `createA2E({ mode: 'auto' })` automatically selects the `wav2arkit_cpu` model on iOS.
+3. **Worker memory** — Multiple Workers each load their own ORT WASM runtime, exceeding iOS tab memory (~1.5GB). The SDK defaults to main-thread inference on iOS.
-iOS Safari has WebGPU API but ONNX Runtime has memory and threading limitations. The SDK provides automatic detection and optimized fallbacks:
-| Feature | iOS Status | Alternative |
-|---------|------------|-------------|
-| **VAD** | Works (0.9ms) | Use as-is |
-| **ASR** | Slow (1.3s) | `SafariSpeechRecognition` |
-| **Lip Sync** | Slow (332ms) | Lambda LAM (server-side) |
-```typescript
-import { shouldUseNativeASR, SafariSpeechRecognition } from '@omote/core';
-// Platform-aware ASR
-if (shouldUseNativeASR()) {
-  const speech = new SafariSpeechRecognition({ language: 'en-US' });
-  speech.onResult((result) => console.log(result.text));
-  await speech.start();
-} else {
-  const whisper = new WhisperInference({ model: 'tiny' });
-  await whisper.load();
-}
-```
+**Consumer requirement:** COEP/COOP headers must be skipped for iOS to avoid triggering SharedArrayBuffer (which forces threaded WASM with 4GB shared memory — crashes iOS). Desktop should keep COEP/COOP for multi-threaded performance.
-See the [iOS Integration Guide](../../docs/ios-integration.md) for complete setup including Lambda LAM deployment.
+| Feature | iOS Status | Notes |
+|---------|------------|-------|
+| Silero VAD | Works | 0.9ms latency |
+| SenseVoice ASR | Works | WASM, ~200ms |
+| A2E Lip Sync | Works | wav2arkit_cpu via createA2E auto-detect, ~45ms |
 ## License