npm - @omote/core - Versions diffs - 0.1.0 - Mend

@omote/core 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md ADDED Viewed

@@ -0,0 +1,584 @@
+# @omote/core
+> WebGPU-accelerated inference for real-time lip sync, speech recognition, emotion detection, and avatar animation - runs entirely in browser.
+## Features
+- **Lip Sync** - LAM (Wav2Vec2) inference (audio → 52 ARKit blendshapes directly)
+- **Speech-to-Text** - Whisper ASR (tiny/base models)
+- **Emotion Detection** - DistilHuBERT speech emotion recognition (7 emotions)
+- **Voice Activity Detection** - Silero VAD for speech detection
+- **Emotion Control** - 10-channel emotion system with presets and transitions
+- **Model Caching** - IndexedDB-based caching for fast subsequent loads
+- **Structured Logging** - 6 log levels, JSON/pretty formats, module-scoped
+- **Telemetry** - OpenTelemetry-compatible tracing and metrics
+- **Offline Ready** - Works entirely without internet
+- **WebGPU + WASM** - Auto-fallback for broad browser support
+## Installation
+```bash
+npm install @omote/core
+```
+## Quick Start
+### Lip Sync (Wav2Vec2Inference)
+```typescript
+import { Wav2Vec2Inference, LAM_BLENDSHAPES } from '@omote/core';
+const lam = new Wav2Vec2Inference({
+  modelUrl: '/models/lam-wav2vec2.onnx',
+  backend: 'auto', // 'webgpu' | 'wasm' | 'auto'
+});
+await lam.load();
+// Process audio (16kHz Float32Array)
+const result = await lam.infer(audioSamples);
+// result.blendshapes is an array of frames, each with 52 ARKit weights
+for (const frame of result.blendshapes) {
+  const jawOpen = frame[LAM_BLENDSHAPES.indexOf('jawOpen')];
+  applyToAvatar(frame);
+}
+```
+### Speech-to-Text (WhisperInference)
+```typescript
+import { WhisperInference } from '@omote/core';
+// Models auto-download from HuggingFace
+const whisper = new WhisperInference({ model: 'tiny' });
+await whisper.load();
+const { text, inferenceTimeMs } = await whisper.transcribe(audioSamples);
+console.log(text); // "Hello world"
+```
+### Emotion Detection (DistilHuBERTEmotionInference)
+```typescript
+import { DistilHuBERTEmotionInference, EMOTION_LABELS } from '@omote/core';
+const emotion = new DistilHuBERTEmotionInference({
+  modelUrl: '/models/distilhubert-emotion.onnx'
+});
+await emotion.load();
+const { emotion: detected, probabilities } = await emotion.infer(audioSamples);
+console.log(detected); // 'happy', 'sad', 'angry', etc.
+```
+### Voice Activity Detection (SileroVADInference)
+```typescript
+import { SileroVADInference } from '@omote/core';
+const vad = new SileroVADInference({
+  modelUrl: '/models/silero-vad.onnx'
+});
+await vad.load();
+const { isSpeech, probability } = await vad.infer(audioSamples);
+```
+### Emotion Control
+```typescript
+import {
+  EmotionController,
+  createEmotionVector,
+  EmotionPresets,
+} from '@omote/core';
+// Create emotion vectors
+const emotion = createEmotionVector({ joy: 0.8, amazement: 0.2 });
+// Or use controller for smooth transitions
+const controller = new EmotionController();
+controller.setPreset('happy');
+controller.transitionTo({ sadness: 0.7 }, 500); // 500ms transition
+// In animation loop
+controller.update();
+const currentEmotion = controller.emotion; // Float32Array(26)
+```
+**Available Emotions:** `amazement`, `anger`, `cheekiness`, `disgust`, `fear`, `grief`, `joy`, `outofbreath`, `pain`, `sadness`
+**Presets:** `neutral`, `happy`, `sad`, `angry`, `surprised`, `scared`, `disgusted`, `excited`, `tired`, `playful`, `pained`, `contemplative`
+### Microphone Capture
+```typescript
+import { MicrophoneCapture } from '@omote/core';
+const mic = new MicrophoneCapture({
+  sampleRate: 16000,
+  bufferSize: 4096,
+});
+mic.on('audio', ({ samples }) => {
+  // Process audio samples
+  const result = await lam.infer(samples);
+});
+await mic.start();
+```
+### Logging
+```typescript
+import { configureLogging, createLogger } from '@omote/core';
+// Configure globally (once at app startup)
+configureLogging({
+  level: 'debug',      // 'error' | 'warn' | 'info' | 'debug' | 'trace' | 'verbose'
+  format: 'pretty',    // 'json' | 'pretty'
+  enabled: true,
+});
+// Create module-specific loggers
+const logger = createLogger('MyComponent');
+logger.info('Model loaded', { backend: 'webgpu', loadTimeMs: 1234 });
+logger.debug('Processing audio', { samples: 16000 });
+logger.error('Failed to load', { error: err.message });
+// JSON output (production):
+// {"timestamp":1704672000000,"level":"info","module":"MyComponent","message":"Model loaded","data":{"backend":"webgpu"}}
+// Pretty output (development):
+// [12:00:00.000] INFO  [MyComponent] Model loaded { backend: 'webgpu' }
+```
+### Telemetry
+OpenTelemetry-compatible observability for inference operations.
+```typescript
+import { configureTelemetry, getTelemetry } from '@omote/core';
+// Development: Console output
+configureTelemetry({
+  enabled: true,
+  serviceName: 'my-app',
+  exporter: 'console',
+});
+// Production: OTLP export to Jaeger/Tempo/etc
+configureTelemetry({
+  enabled: true,
+  serviceName: 'my-app',
+  serviceVersion: '1.0.0',
+  exporter: 'otlp',
+  exporterConfig: {
+    endpoint: 'https://tempo.example.com',
+    headers: { 'Authorization': 'Bearer token' },
+  },
+  sampling: {
+    ratio: 0.1,              // Sample 10% of traces
+    alwaysSampleErrors: true, // Always capture errors
+  },
+});
+// Manual instrumentation
+const telemetry = getTelemetry();
+// Create spans for custom operations
+const span = telemetry.startSpan('custom-operation', {
+  'custom.attribute': 'value',
+});
+try {
+  // ... do work
+  span.setStatus('ok');
+} catch (error) {
+  span.setStatus('error', error);
+} finally {
+  span.end();
+}
+// Record metrics
+telemetry.recordMetric('custom_counter', 1, 'counter', { label: 'value' });
+telemetry.recordMetric('custom_gauge', 42.5, 'gauge');
+```
+### Model Caching
+Automatic IndexedDB caching for ONNX models.
+```typescript
+import { getModelCache, fetchWithCache, preloadModels, formatBytes } from '@omote/core';
+// Fetch with automatic caching (used internally by inference classes)
+const modelData = await fetchWithCache('/models/lam-wav2vec2.onnx', (loaded, total) => {
+  console.log(`Loading: ${formatBytes(loaded)} / ${formatBytes(total)}`);
+});
+// Preload multiple models
+await preloadModels([
+  '/models/lam-wav2vec2.onnx',
+  '/models/silero-vad.onnx',
+], (completed, total, url) => {
+  console.log(`Preloaded ${completed}/${total}: ${url}`);
+});
+// Manual cache management
+const cache = getModelCache();
+const stats = await cache.getStats();
+console.log(`Cached: ${stats.modelCount} models, ${formatBytes(stats.totalSize)}`);
+await cache.delete('/models/old-model.onnx');
+await cache.clear(); // Clear all cached models
+```
+## Models
+### Required Files
+Place models in your public assets folder:
+```
+public/
+  models/
+    lam-wav2vec2.onnx         # LAM lip sync model
+    silero-vad.onnx           # Voice activity detection
+    distilhubert-emotion.onnx # Emotion detection
+```
+### Whisper Models (Auto-Download)
+Whisper models download automatically from HuggingFace on first use:
+| Model | Size | Quality | Speed |
+|-------|------|---------|-------|
+| `tiny` | ~75MB | Good | Fastest |
+| `base` | ~150MB | Better | Medium |
+```typescript
+const whisper = new WhisperInference({ model: 'tiny' }); // Recommended for real-time
+```
+## API Reference
+### Wav2Vec2Inference (LAM)
+LAM lip sync - audio to 52 ARKit blendshapes directly.
+| Method | Description |
+|--------|-------------|
+| `new Wav2Vec2Inference(config)` | Create with `{ modelUrl, backend? }` |
+| `load()` | Load ONNX model |
+| `infer(audio)` | Run inference |
+| `dispose()` | Release resources |
+```typescript
+interface Wav2Vec2Result {
+  blendshapes: Float32Array[];  // Array of frames, each 52 ARKit weights
+  inferenceTimeMs: number;
+  backend: 'webgpu' | 'wasm';
+}
+```
+### WhisperInference
+Whisper speech-to-text.
+| Method | Description |
+|--------|-------------|
+| `new WhisperInference(config)` | Create with `{ model, modelUrl? }` |
+| `load()` | Load encoder + decoder |
+| `transcribe(audio)` | Transcribe audio |
+| `dispose()` | Release resources |
+### DistilHuBERTEmotionInference
+Speech emotion recognition (7 emotions).
+| Method | Description |
+|--------|-------------|
+| `new DistilHuBERTEmotionInference(config)` | Create with `{ modelUrl }` |
+| `load()` | Load ONNX model |
+| `infer(audio)` | Detect emotion |
+| `dispose()` | Release resources |
+**Emotion Labels:** `angry`, `disgusted`, `fearful`, `happy`, `neutral`, `sad`, `surprised`
+### SileroVADInference
+Voice activity detection.
+| Method | Description |
+|--------|-------------|
+| `new SileroVADInference(config)` | Create with `{ modelUrl }` |
+| `load()` | Load ONNX model |
+| `infer(audio)` | Detect speech |
+| `dispose()` | Release resources |
+### EmotionController
+Emotion state with smooth transitions.
+| Method | Description |
+|--------|-------------|
+| `set(weights)` | Set emotion immediately |
+| `setPreset(name)` | Set preset immediately |
+| `transitionTo(weights, ms)` | Smooth transition |
+| `transitionToPreset(name, ms)` | Transition to preset |
+| `update()` | Update transition (call each frame) |
+| `reset()` | Reset to neutral |
+| Property | Type | Description |
+|----------|------|-------------|
+| `emotion` | `Float32Array` | Current 26-element vector |
+| `isTransitioning` | `boolean` | Transition in progress |
+### Logger
+Structured logging with multiple output formats.
+| Function | Description |
+|----------|-------------|
+| `configureLogging(config)` | Set global logging configuration |
+| `createLogger(module)` | Create a module-specific logger |
+| `getGlobalLogger()` | Get the global logger instance |
+**Logger Methods:**
+| Method | Description |
+|--------|-------------|
+| `error(message, data?)` | Log error (always shown) |
+| `warn(message, data?)` | Log warning |
+| `info(message, data?)` | Log info |
+| `debug(message, data?)` | Log debug |
+| `trace(message, data?)` | Log trace |
+| `verbose(message, data?)` | Log verbose (most detailed) |
+| `child(subModule)` | Create child logger with prefixed module |
+**Configuration:**
+```typescript
+interface LoggerConfig {
+  level?: 'error' | 'warn' | 'info' | 'debug' | 'trace' | 'verbose';
+  enabled?: boolean;
+  format?: 'json' | 'pretty';
+  sink?: (entry: LogEntry) => void;  // Custom output handler
+}
+```
+### OmoteTelemetry
+OpenTelemetry-compatible telemetry for tracing and metrics.
+| Function | Description |
+|----------|-------------|
+| `configureTelemetry(config)` | Initialize telemetry system |
+| `getTelemetry()` | Get global telemetry instance |
+**OmoteTelemetry Methods:**
+| Method | Description |
+|--------|-------------|
+| `startSpan(name, attributes?)` | Start a new trace span |
+| `recordMetric(name, value, type, attributes?)` | Record a metric |
+| `flush()` | Force flush all pending data |
+| `shutdown()` | Shutdown telemetry system |
+**Span Methods:**
+| Method | Description |
+|--------|-------------|
+| `setAttribute(key, value)` | Add attribute to span |
+| `setStatus(status, error?)` | Set span status ('ok' or 'error') |
+| `end()` | End the span |
+**Configuration:**
+```typescript
+interface TelemetryConfig {
+  enabled: boolean;
+  serviceName: string;
+  serviceVersion?: string;
+  exporter: 'console' | 'otlp' | 'none';
+  exporterConfig?: {
+    endpoint: string;
+    headers?: Record<string, string>;
+    timeoutMs?: number;
+  };
+  sampling?: {
+    ratio?: number;           // 0.0 to 1.0
+    alwaysSampleErrors?: boolean;
+  };
+}
+```
+### ModelCache
+IndexedDB-based model caching.
+| Function | Description |
+|----------|-------------|
+| `getModelCache()` | Get singleton cache instance |
+| `fetchWithCache(url, onProgress?)` | Fetch with automatic caching |
+| `preloadModels(urls, onProgress?)` | Preload multiple models |
+| `formatBytes(bytes)` | Format bytes as human-readable |
+**ModelCache Methods:**
+| Method | Description |
+|--------|-------------|
+| `has(url)` | Check if model is cached |
+| `get(url)` | Get cached model data |
+| `set(url, data, etag?)` | Store model in cache |
+| `delete(url)` | Remove model from cache |
+| `clear()` | Clear all cached models |
+| `getStats()` | Get cache statistics |
+### Utility Functions
+```typescript
+// Create emotion vector from named weights
+createEmotionVector({ joy: 0.8, amazement: 0.2 }): Float32Array
+// Blend multiple emotions
+blendEmotions([
+  { vector: preset1, weight: 0.7 },
+  { vector: preset2, weight: 0.3 },
+]): Float32Array
+// Linear interpolation
+lerpEmotion(from, to, t): Float32Array
+// Get preset copy
+getEmotionPreset('happy'): Float32Array
+```
+## ARKit Blendshapes
+52 output blendshapes compatible with ARKit:
+```
+eyeBlinkLeft, eyeLookDownLeft, eyeLookInLeft, eyeLookOutLeft, eyeLookUpLeft,
+eyeSquintLeft, eyeWideLeft, eyeBlinkRight, eyeLookDownRight, eyeLookInRight,
+eyeLookOutRight, eyeLookUpRight, eyeSquintRight, eyeWideRight,
+jawForward, jawLeft, jawRight, jawOpen,
+mouthClose, mouthFunnel, mouthPucker, mouthLeft, mouthRight,
+mouthSmileLeft, mouthSmileRight, mouthFrownLeft, mouthFrownRight,
+mouthDimpleLeft, mouthDimpleRight, mouthStretchLeft, mouthStretchRight,
+mouthRollLower, mouthRollUpper, mouthShrugLower, mouthShrugUpper,
+mouthPressLeft, mouthPressRight, mouthLowerDownLeft, mouthLowerDownRight,
+mouthUpperUpLeft, mouthUpperUpRight,
+browDownLeft, browDownRight, browInnerUp, browOuterUpLeft, browOuterUpRight,
+cheekPuff, cheekSquintLeft, cheekSquintRight,
+noseSneerLeft, noseSneerRight, tongueOut
+```
+## Technical Specifications
+### Audio Input
+| Parameter | Value |
+|-----------|-------|
+| Sample Rate | 16kHz |
+| Format | Float32Array or Int16Array |
+### Wav2Vec2 (LAM) Model
+| Parameter | Value |
+|-----------|-------|
+| Input | 16kHz audio samples |
+| Output | 52 ARKit blendshapes per frame |
+| Frame Rate | 30fps |
+| Backend | WebGPU / WASM |
+### DistilHuBERT Emotion Labels
+```
+angry, disgusted, fearful, happy, neutral, sad, surprised
+```
+## AI Conversation (Platform Integration)
+For production deployments with the Omote Platform, use the `AgentCoreAdapter` which handles:
+- WebSocket connection to AgentCore backend
+- Local Whisper ASR for speech-to-text
+- Receives TTS audio from backend (ElevenLabs handled server-side)
+- Local LAM inference for lip sync animation
+```typescript
+import { AgentCoreAdapter, ConversationOrchestrator } from '@omote/core';
+const orchestrator = new ConversationOrchestrator({
+  adapter: {
+    endpoint: 'wss://your-agentcore-endpoint.com/ws',
+    models: {
+      lamUrl: '/models/lam-wav2vec2.onnx',
+    },
+  },
+});
+// Register tenant
+orchestrator.registerTenant({
+  tenantId: 'tenant-123',
+  characterId: 'character-abc',
+  credentials: { authToken: 'jwt-token' },
+});
+// Create session
+const session = await orchestrator.createSession('tenant-123', {
+  systemPrompt: 'You are a helpful assistant.',
+});
+// Listen for animation events
+orchestrator.on('animation', ({ blendshapes }) => {
+  applyToAvatar(blendshapes);
+});
+// Push audio from microphone
+session.pushAudio(audioSamples);
+```
+## Browser Support
+| Browser | WebGPU | WASM Fallback |
+|---------|--------|---------------|
+| Chrome 113+ | Yes | Yes |
+| Edge 113+ | Yes | Yes |
+| Firefox | No | Yes |
+| Safari 18+ | Yes | Yes |
+The SDK auto-detects WebGPU support and falls back to WASM when unavailable.
+## iOS Support
+iOS Safari has WebGPU API but ONNX Runtime has memory and threading limitations. The SDK provides automatic detection and optimized fallbacks:
+| Feature | iOS Status | Alternative |
+|---------|------------|-------------|
+| **VAD** | Works (0.9ms) | Use as-is |
+| **ASR** | Slow (1.3s) | `SafariSpeechRecognition` |
+| **Lip Sync** | Slow (332ms) | Lambda LAM (server-side) |
+```typescript
+import { shouldUseNativeASR, SafariSpeechRecognition } from '@omote/core';
+// Platform-aware ASR
+if (shouldUseNativeASR()) {
+  const speech = new SafariSpeechRecognition({ language: 'en-US' });
+  speech.onResult((result) => console.log(result.text));
+  await speech.start();
+} else {
+  const whisper = new WhisperInference({ model: 'tiny' });
+  await whisper.load();
+}
+```
+See the [iOS Integration Guide](../../docs/ios-integration.md) for complete setup including Lambda LAM deployment.
+## License
+MIT