npm - @omote/core - Versions diffs - 0.10.5 → 0.10.6 - Mend

@omote/core 0.10.5 → 0.10.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/README.md +76 -34
package/dist/chunk-3FILA2CD.mjs +785 -0
package/dist/chunk-3FILA2CD.mjs.map +1 -0
package/dist/chunk-5WIOGMJA.mjs +785 -0
package/dist/chunk-5WIOGMJA.mjs.map +1 -0
package/dist/chunk-NWZMIQK4.mjs +782 -0
package/dist/chunk-NWZMIQK4.mjs.map +1 -0
package/dist/chunk-WW4XAUJ3.mjs +208 -0
package/dist/chunk-WW4XAUJ3.mjs.map +1 -0
package/dist/index.d.mts +84 -79
package/dist/index.d.ts +84 -79
package/dist/index.js +514 -406
package/dist/index.js.map +1 -1
package/dist/index.mjs +233 -199
package/dist/index.mjs.map +1 -1
package/dist/logging/index.js +5 -0
package/dist/logging/index.js.map +1 -1
package/dist/logging/index.mjs +1 -1
package/dist/otlp-2BML6FIK.mjs +7 -0
package/dist/otlp-2BML6FIK.mjs.map +1 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -15,7 +15,7 @@
 - **SpeechListener** — Mic → VAD → ASR orchestration with adaptive silence detection
 - **createTTSPlayer()** — Factory composing Kokoro TTS + TTSSpeaker for zero-config playback
 - **VoiceOrchestrator** — Full conversational agent loop with local TTS support (cloud or offline)
-- **configureOrtCdn()** — Enterprise CDN override for ORT WASM/WebGPU binaries
+- **configureModelUrls()** — Self-host model files from your own CDN
 - **Animation Graph** — State machine (idle/listening/thinking/speaking) with emotion blending
 - **Emotion Controller** — Preset-based emotion system with smooth transitions
 - **Model Caching** — IndexedDB with versioning, LRU eviction, and quota monitoring
@@ -81,15 +81,15 @@ const { blendshapes } = await a2e.infer(audioSamples); // Float32Array (16kHz)
 // → 52 ARKit blendshape weights
 ```
-#### Direct API
+#### Custom Configuration
 ```typescript
-import { A2EInference, ARKIT_BLENDSHAPES } from '@omote/core';
+import { createA2E, ARKIT_BLENDSHAPES } from '@omote/core';
-const lam = new A2EInference({ modelUrl: '/models/model_fp16.onnx' });
-await lam.load();
+const a2e = createA2E({ backend: 'wasm' }); // Force WASM for testing
+await a2e.load();
-const { blendshapes } = await lam.infer(audioSamples);
+const { blendshapes } = await a2e.infer(audioSamples);
 const jawOpen = blendshapes[ARKIT_BLENDSHAPES.indexOf('jawOpen')];
 ```
@@ -136,11 +136,9 @@ const frame = processor.getFrameForTime(audioContext.currentTime);
 SenseVoice ASR — 15x faster than Whisper, with progressive transcription and emotion detection.
 ```typescript
-import { SenseVoiceInference } from '@omote/core';
+import { createSenseVoice } from '@omote/core';
-const asr = new SenseVoiceInference({
-  modelUrl: '/models/sensevoice/model.int8.onnx',
-});
+const asr = createSenseVoice(); // Auto-detects platform, fetches from HF CDN
 await asr.load();
 const { text, emotion, language } = await asr.transcribe(audioSamples);
@@ -149,22 +147,19 @@ const { text, emotion, language } = await asr.transcribe(audioSamples);
 #### Platform-Aware ASR
 ```typescript
-import { shouldUseNativeASR, SafariSpeechRecognition, SenseVoiceInference } from '@omote/core';
+import { shouldUseNativeASR, SafariSpeechRecognition, createSenseVoice } from '@omote/core';
 const asr = shouldUseNativeASR()
   ? new SafariSpeechRecognition({ language: 'en-US' })
-  : new SenseVoiceInference({ modelUrl: '/models/sensevoice/model.int8.onnx' });
+  : createSenseVoice();
 ```
 ### Voice Activity Detection (Silero VAD)
-#### Factory API (Recommended)
 ```typescript
 import { createSileroVAD } from '@omote/core';
 const vad = createSileroVAD({
-  modelUrl: '/models/silero-vad.onnx',
   threshold: 0.5,
   // useWorker: true   // Force off-main-thread
   // useWorker: false  // Force main thread
@@ -174,18 +169,6 @@ await vad.load();
 const { isSpeech, probability } = await vad.process(audioSamples);
 ```
-#### Direct API
-```typescript
-import { SileroVADInference, SileroVADWorker } from '@omote/core';
-// Main thread (mobile-friendly)
-const vad = new SileroVADInference({ modelUrl: '/models/silero-vad.onnx' });
-// Web Worker (desktop, off-main-thread)
-const vadWorker = new SileroVADWorker({ modelUrl: '/models/silero-vad.onnx' });
-```
 ### Animation Graph
 State machine for avatar animation states with emotion blending and audio energy.
@@ -248,7 +231,7 @@ const data = await fetchWithCache('/models/model.onnx', {
 });
 // Cache quota monitoring
-import { configureCacheLimit, getQuotaInfo } from '@omote/core';
+import { configureCacheLimit } from '@omote/core';
 configureCacheLimit({
   maxSizeBytes: 500 * 1024 * 1024, // 500MB limit
@@ -307,17 +290,76 @@ const span = telemetry.startSpan('custom-operation');
 span.end();
 ```
-## Models
+### Text-to-Speech (Kokoro TTS)
+```typescript
+import { createKokoroTTS } from '@omote/core';
-Place models in your public assets directory:
+const tts = createKokoroTTS({ defaultVoice: 'af_heart' });
+await tts.load();
+const audio = await tts.synthesize('Hello world!');
+// audio: Float32Array @ 24kHz
 ```
-public/models/
-  model_fp16.onnx                 # A2E lip sync — WebGPU (192MB fp16, from omote-ai/lam-a2e)
-  sensevoice/model.int8.onnx      # SenseVoice ASR (239MB)
-  silero-vad.onnx                 # Voice activity detection (~2MB)
+Kokoro auto-detects the platform: mixed-fp16 WebGPU model (156MB) on Chrome/Edge, q8 WASM model (92MB) on Safari/iOS/Firefox.
+### Eager Load & Warmup
+Use `eagerLoad` to preload models at construction time:
+```typescript
+const tts = createKokoroTTS({ eagerLoad: true }); // Starts loading immediately
 ```
+Use `warmup()` to prime AudioContext for iOS/Safari autoplay policy. Call from a user gesture handler:
+```typescript
+button.onclick = async () => {
+  await avatar.warmup(); // Primes AudioContext
+  await avatar.connectVoice({ ... });
+};
+```
+### Observability
+The SDK includes built-in OpenTelemetry-compatible tracing and metrics:
+```typescript
+import { configureTelemetry, getTelemetry, MetricNames } from '@omote/core';
+configureTelemetry({
+  enabled: true,
+  serviceName: 'my-app',
+  exporter: 'console', // or OTLPExporter for production
+});
+```
+All inference calls, model loads, cache operations, and voice turns are automatically instrumented.
+## Models
+All models default to the HuggingFace CDN and are auto-downloaded on first use. Self-host with `configureModelUrls()`:
+```typescript
+import { configureModelUrls } from '@omote/core';
+configureModelUrls({
+  lam: 'https://your-cdn.com/models/lam.onnx',
+  lamData: 'https://your-cdn.com/models/lam.onnx.data',
+  senseVoice: 'https://your-cdn.com/models/sensevoice.onnx',
+  sileroVad: 'https://your-cdn.com/models/silero_vad.onnx',
+});
+```
+| Model | HuggingFace Repo | Size |
+|-------|-------------------|------|
+| LAM A2E | `omote-ai/lam-a2e` | `lam.onnx` (230KB) + `lam.onnx.data` (192MB) |
+| SenseVoice | `omote-ai/sensevoice-asr` | 228MB |
+| Silero VAD | `deepghs/silero-vad-onnx` | ~2MB |
+| Kokoro TTS (WASM) | `onnx-community/Kokoro-82M-v1.0-ONNX` | 92MB q8 |
+| Kokoro TTS (WebGPU) | `omote-ai/kokoro-tts` | 156MB mixed-fp16 |
 ## Browser Compatibility
 WebGPU-first with automatic WASM fallback.