npm - @madeinoz67/voice-server - Versions diffs - 0.1.3 - Mend

@madeinoz67/voice-server 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (86) hide show

package/.claude/commands/speckit.analyze.md +184 -0
package/.claude/commands/speckit.checklist.md +294 -0
package/.claude/commands/speckit.clarify.md +181 -0
package/.claude/commands/speckit.constitution.md +82 -0
package/.claude/commands/speckit.implement.md +135 -0
package/.claude/commands/speckit.plan.md +89 -0
package/.claude/commands/speckit.specify.md +258 -0
package/.claude/commands/speckit.tasks.md +137 -0
package/.claude/commands/speckit.taskstoissues.md +30 -0
package/.claude/settings.local.json +23 -0
package/.codanna/settings.toml +384 -0
package/.env.development +18 -0
package/.env.example +30 -0
package/.github/codeql/config.yml +13 -0
package/.github/codeql.yml +30 -0
package/.github/dependabot.yml +11 -0
package/.github/workflows/ci.yml +308 -0
package/.specify/memory/constitution.md +223 -0
package/.specify/scripts/bash/check-prerequisites.sh +166 -0
package/.specify/scripts/bash/common.sh +156 -0
package/.specify/scripts/bash/create-new-feature.sh +297 -0
package/.specify/scripts/bash/setup-plan.sh +61 -0
package/.specify/scripts/bash/update-agent-context.sh +799 -0
package/.specify/templates/agent-file-template.md +28 -0
package/.specify/templates/checklist-template.md +40 -0
package/.specify/templates/plan-template.md +106 -0
package/.specify/templates/spec-template.md +115 -0
package/.specify/templates/tasks-template.md +261 -0
package/AGENTPERSONALITIES.md +233 -0
package/ATTRIBUTION.md +70 -0
package/CHANGELOG.md +90 -0
package/CLAUDE.md +50 -0
package/Formula/madeinoz-voice-server.rb +106 -0
package/README.md +451 -0
package/bun.lock +212 -0
package/cliff.toml +67 -0
package/docs/KOKORO_VOICES.md +152 -0
package/docs/MIGRATION.md +267 -0
package/docs/VOICE_EXAMPLES.md +283 -0
package/docs/VOICE_GUIDE.md +227 -0
package/docs/VOICE_QUICK_REF.md +157 -0
package/docs/agent-voices.md +114 -0
package/docs/api.md +336 -0
package/docs/assets/voice-server-architecture.png +0 -0
package/docs/assets/voice-server-header.png +0 -0
package/docs/assets/voice-server-pack-logo.png +0 -0
package/docs/index.md +60 -0
package/eslint.config.js +42 -0
package/mkdocs.yml +55 -0
package/package.json +28 -0
package/reports/MLX_AUDIO_EVALUATION.md +302 -0
package/reports/agent/2026-02-06-20-51-mlx-audio-qwen-tts-investigation.md +613 -0
package/reports/agent/2026-02-06-Qwen3-TTS-API-Specification.md +446 -0
package/reports/agent/2026-02-07-python-backend-removal-plan.md +790 -0
package/scripts/generate-reference.ts +139 -0
package/specs/001-qwen-tts/checklists/requirements.md +50 -0
package/specs/001-qwen-tts/contracts/api.yaml +305 -0
package/specs/001-qwen-tts/data-model.md +197 -0
package/specs/001-qwen-tts/plan.md +236 -0
package/specs/001-qwen-tts/quickstart.md +306 -0
package/specs/001-qwen-tts/research.md +194 -0
package/specs/001-qwen-tts/spec.md +135 -0
package/specs/001-qwen-tts/tasks.md +305 -0
package/src/ts/constants/KOKORO_VOICES.ts +141 -0
package/src/ts/middleware/cors.ts +153 -0
package/src/ts/middleware/rate-limiter.ts +200 -0
package/src/ts/models/health.ts +45 -0
package/src/ts/models/notification.ts +69 -0
package/src/ts/models/pronunciation.ts +39 -0
package/src/ts/models/tts.ts +54 -0
package/src/ts/models/voice-config.ts +82 -0
package/src/ts/server.ts +460 -0
package/src/ts/services/mlx-tts-client.ts +337 -0
package/src/ts/services/pronunciation.ts +209 -0
package/src/ts/services/prosody-translator.ts +130 -0
package/src/ts/services/voice-loader.ts +214 -0
package/src/ts/utils/logger.ts +144 -0
package/src/ts/utils/text-sanitizer.ts +118 -0
package/tests/integration/api.test.ts +210 -0
package/tests/mocks/index.ts +152 -0
package/tests/ts/server.test.ts +11 -0
package/tests/unit/middleware/cors.test.ts +146 -0
package/tests/unit/models/validation.test.ts +332 -0
package/tests/unit/services/pronunciation.test.ts +171 -0
package/tests/unit/services/prosody-translator.test.ts +142 -0
package/tsconfig.json +25 -0

package/reports/MLX_AUDIO_EVALUATION.md ADDED Viewed

@@ -0,0 +1,302 @@
+# MLX-Audio Evaluation Report
+**Date:** 2026-02-06
+**Purpose:** Evaluate MLX-audio as alternative to qwen-tts for TTS inference
+**Python 3.13 Test Results:** ✅ **MAJOR PROGRESS**
+## Summary
+Initial testing revealed Python 3.14 compatibility issues. Testing with Python 3.13 shows significant improvement - MLX-audio imports and loads models successfully, though synthesis testing revealed additional challenges.
+MLX-audio installation and evaluation revealed several challenges that make direct migration complex at this time.
+## Findings
+### 1. qwen-tts (Current Implementation) ✓ WORKING
+**Model:** Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
+**Device:** Apple Metal (MPS) GPU acceleration
+| Metric | Value |
+|--------|-------|
+| Model Load Time | 21.8s |
+| Load Memory | 1317 MB |
+| Inference Time | 13-16s |
+| Inference Memory | 1300 MB |
+| Real-Time Factor (RTF) | 3.87x |
+| Sample Rate | 24kHz |
+**API Method:** `generate_voice_design(text, language, speaker, instruct, stream)`
+### 2. MLX-Audio Challenges ✗ ISSUES FOUND
+#### Challenge A: Python 3.14 Compatibility
+- **Issue:** Pydantic V1 incompatible with Python 3.14
+- **Impact:** Kokoro model fails to load
+- **Error:** `Core Pydantic V1 functionality isn't compatible with Python 3.14 or greater`
+- **Status:** Requires Python 3.13 or older
+#### Challenge B: Qwen3 Model Loading
+- **Issue:** Complex nested config structure not compatible with standard loading
+- **Components:** speaker_encoder, talker, code_predictor, speech_tokenizer
+- **Error:** `478 parameters not in model` when loading with standard approach
+- **Status:** Requires specialized MLX-audio loader that has compatibility issues
+#### Challenge C: Transformers Compatibility
+- **Issue:** Transformers doesn't recognize `qwen3_tts` model type
+- **Error:** `KeyError: 'qwen3_tts'` in AutoConfig
+- **Workaround:** Load config.json directly
+- **Status:** Partially resolved
+## MLX-Audio Models Tested
+| Model | Status | Issue |
+|-------|--------|-------|
+| mlx-community/Qwen3-TTS-12Hz-0.6B-Base-bf16 | ✗ Config incompatibility | Complex nested config, 478 extra parameters |
+| mlx-community/Kokoro-82M-bf16 | ✗ Python 3.14 incompatibility | Pydantic V1 error |
+## Recommendation
+**STAY WITH CURRENT qwen-tts IMPLEMENTATION**
+**Rationale:**
+1. qwen-tts with MPS is working reliably
+2. MLX-audio has compatibility issues with current Python version
+3. MLX-audio Qwen3 models require specialized loading not available in current mlx-audio release
+4. Migration complexity outweighs potential performance benefits at this time
+## Alternative Approaches
+### Option 1: Wait for MLX-audio Updates
+- Monitor MLX-audio for Python 3.14 compatibility fixes
+- Wait for official Qwen3-TTS model support
+### Option 2: Python Version Downgrade
+- Use Python 3.13 for MLX-audio
+- Requires separate virtual environment
+- Adds operational complexity
+### Option 3: Direct MLX Implementation
+- Bypass mlx-audio package
+- Implement MLX Qwen3 model directly
+- Significant development effort required
+### Option 4: Current Implementation Optimization
+- Continue using qwen-tts with MPS
+- Implement request caching
+- Add pre-warming for model
+- Consider model quantization
+## Performance Notes
+The current qwen-tts implementation with MPS:
+- **RTF of 3.87x** means 13-16 seconds for 4 seconds of audio
+- This is primarily model inference time, not I/O
+- **True streaming** would require model architecture changes
+- qwen-tts `stream=True` generates complete audio then splits (not true streaming)
+## Performance Benchmark Results (NEW - 2026-02-06 21:50)
+**Breakthrough:** After installing pip in MLX-audio tool's Python environment, we successfully completed a performance benchmark comparing MLX-audio Kokoro vs qwen-tts.
+### Benchmark Results
+| Metric | MLX-audio Kokoro | qwen-tts (MPS) | Advantage |
+|--------|------------------|-----------------|-----------|
+| **Processing Time** | **0.37s** | 13-16s | **35-43x faster** |
+| **Model Load** | ~3-5s (cached) | 21.8s | ~4-5x faster |
+| **Audio Duration** | 3.45s | ~3.5s | Similar |
+| **Real-Time Factor** | **0.11x** | 3.87x | **35x faster** |
+| **Peak Memory** | 1.30GB | 1.3GB | Similar |
+| **Audio Quality** | Good (American female) | Good (Marrvin) | Comparable |
+### Key Findings
+**🚀 MLX-Audio is 35-43x FASTER** than qwen-tts with MPS GPU acceleration!
+- **Real-Time Factor 0.11x:** MLX-audio generates 3.45s of audio in just 0.37s (faster than real-time!)
+- **qwen-tts RTF 3.87x:** Takes 13-16 seconds to generate 3.5s of audio
+- **Memory usage is comparable:** Both use ~1.3GB peak memory
+- **Audio quality is similar:** Both produce clear, natural speech
+### Audio Files
+- **MLX-audio:** `/tmp/mlx_benchmark_1770385812/mlx_kokoro_000.wav` (162KB)
+- **qwen-tts:** Previous benchmark at `/tmp/qwen_benchmark_qwen.wav`
+### Updated Recommendation
+**MLX-AUDIO IS NOW THE RECOMMENDED CHOICE** for TTS on Apple Silicon:
+1. ✅ **Massive performance improvement** (35-43x faster)
+2. ✅ **Native Apple Silicon optimization** (no PyTorch overhead)
+3. ✅ **Same memory footprint** (~1.3GB)
+4. ✅ **Comparable audio quality**
+5. ✅ **Faster than real-time synthesis** (RTF 0.11x)
+**Migration Priority:** HIGH - Consider migrating from qwen-tts to MLX-audio for significant performance gains.
+---
+## Qwen3-TTS Tests with MLX-audio v0.3.1 (NEW - 2026-02-06 06:00)
+**BREAKTHROUGH: MLX-audio v0.3.1 fixes Qwen3-TTS config compatibility issues!**
+After upgrading from v0.2.10 to v0.3.1 (installed directly from GitHub), both Qwen3-TTS models now work correctly:
+### Qwen3-TTS Performance Results
+| Model | Processing Time | Audio Duration | RTF | Peak Memory | Status |
+|-------|-----------------|----------------|-----|-------------|--------|
+| **Qwen3-TTS 0.6B Base** | 4.07s | 3.36s | **0.83x** | 4.82GB | ✅ Working |
+| **Qwen3-TTS 1.7B VoiceDesign** | 1.05s | 0.88s | **0.84x** | 5.30GB | ✅ Working |
+| **Kokoro-82M** | 0.37s | 3.45s | **0.11x** | 1.30GB | ✅ Working |
+| **qwen-tts (MPS)** | 13-16s | ~3.5s | 3.87x | 1.3GB | Current |
+### Key Findings
+**Qwen3-TTS Performance:**
+- **4-5x faster** than qwen-tts with MPS (0.83x RTF vs 3.87x RTF)
+- **Same voice characteristics** as current qwen-tts system
+- **Higher memory usage** (4.8-5.3GB) vs Kokoro/qwen-tts (1.3GB)
+- **VoiceDesign model supports natural language voice descriptions**
+**Performance Comparison:**
+- **Kokoro-82M**: Fastest (0.11x RTF), lowest memory (1.3GB), American female voice
+- **Qwen3-TTS 0.6B**: Good speed (0.83x RTF), same voices as qwen-tts, higher memory
+- **Qwen3-TTS 1.7B**: Similar speed (0.84x RTF), voice customization, highest memory
+### Updated Recommendations
+**By Use Case:**
+| Use Case | Recommended Model | Reason |
+|----------|-------------------|--------|
+| **Maximum performance** | Kokoro-82M | 35-43x faster, lowest memory |
+| **Same voices as current** | Qwen3-TTS 0.6B/1.7B | 4-5x faster, compatible voices |
+| **Custom voice design** | Qwen3-TTS 1.7B VoiceDesign | Natural language voice descriptions |
+**Installation:**
+```bash
+# Install MLX-audio v0.3.1 from GitHub for Qwen3-TTS support
+uv tool install 'mlx-audio @ git+https://github.com/Blaizzy/mlx-audio@v0.3.1'
+```
+---
+## Final Conclusion
+**MLX-AUDIO IS THE CLEAR WINNER** after upgrading to v0.3.1:
+- **Qwen3-TTS**: 4-5x faster than qwen-tts with same voices
+- **Kokoro-82M**: 35-43x faster than qwen-tts
+- **Both models** generate faster than real-time (RTF < 1.0x)
+**Issues resolved:**
+- ✅ Python 3.13 compatibility - FIXED
+- ✅ Missing pip module - FIXED
+- ✅ Qwen3-TTS config structure - FIXED (v0.3.1)
+- ✅ Model loading - WORKS
+- ✅ Audio synthesis - WORKS
+**Recommendation:** Migrate to MLX-audio (Kokoro for speed, Qwen3-TTS for voice compatibility).
+---
+*Previous evaluation findings (superseded):*
+The evaluation below documents the investigation process that led to discovering the pip issue.
+---
+## Python 3.13 Test Results (NEW - 2026-02-06 21:25)
+**Hypothesis:** Python 3.13 would resolve the Pydantic V1 compatibility issues preventing MLX-audio from working.
+### Test Setup
+- **Environment:** Python 3.13.12 (via uv venv)
+- **Command:** `uv venv --python 3.13 .venv-py313`
+- **Installation:** `uv pip install mlx-audio` (176 packages, successful)
+### Results Summary
+| Test | Python 3.14 | Python 3.13 |
+|------|-------------|--------------|
+| MLX import | ✅ Works | ✅ Works |
+| MLX-audio import | ✅ Works | ✅ Works |
+| Kokoro model load | ❌ Config error | ✅ **3.46s** |
+| Kokoro synthesis | N/A | ❌ **Hangs** |
+| Qwen3-TTS load | ❌ Config error | Not tested |
+### Key Findings
+**✅ SUCCESS: Python 3.13 Resolves Import Issues**
+- MLX-audio imports successfully without Pydantic V1 errors
+- Kokoro model loads in ~3.5 seconds (vs config errors in 3.14)
+- MLX Metal backend is available
+**❌ REMAINING ISSUE: Synthesis Hangs**
+- Model loads successfully but `model.generate()` hangs indefinitely
+- Issue appears to be in the synthesis/generation pipeline
+- Not a misaki import issue (misaki imports successfully)
+- Not an MLX compute issue (basic MLX operations work)
+**Root Cause Analysis:**
+The synthesis hanging is likely due to:
+1. MLX computation graph issue specific to the Kokoro model
+2. Missing or incompatible dependency in the synthesis pipeline
+3. Model-specific issue (not general MLX-audio problem)
+### Updated Recommendation
+**PARTIAL PROGRESS:** Python 3.13 resolves import compatibility, but synthesis issues remain.
+**Next Steps:**
+1. **Report issue** to MLX-audio GitHub with reproduction case
+2. **Try different MLX-audio models** (e.g., Qwen3-TTS instead of Kokoro)
+3. **Wait for MLX-audio updates** addressing Python 3.13 compatibility
+4. **Stick with current qwen-tts** implementation for now
+**Python 3.13 Test Conclusion:**
+The Python 3.13 hypothesis was partially correct - it resolved import issues, but revealed a new synthesis-related problem. Further investigation required.
+---
+## Server Interference Test (NEW - 2026-02-06 21:44)
+**User Hypothesis:** MLX-audio issues might be caused by the running qwen-tts server (port 7860) interfering with MLX-audio operations.
+**Test Method:**
+1. Identified running voice server: `bun run src/ts/server.ts` (PID 71753)
+2. Stopped voice server process
+3. Verified port 7860 was free
+4. Retested MLX-audio Kokoro in clean environment
+**Result:** ❌ **MLX-AUDIO STILL HANGS**
+Even with the voice server completely stopped and port 7860 confirmed free, MLX-audio Kokoro model:
+- Loads successfully (3.46s)
+- Begins synthesis
+- **Hangs at same point** (after displaying text/voice/speed/language parameters)
+**Conclusion:** The MLX-audio synthesis hanging issue is **NOT caused by server interference**. The issue is intrinsic to MLX-audio's Kokoro model synthesis pipeline, likely related to:
+- misaki text processing library
+- MLX compute graph issues
+- Kokoro model implementation bug
+**User's hypothesis was valid to test** and represented good scientific method, but the root cause is confirmed to be within MLX-audio itself, not environmental conflicts.
+---
+*Previous findings below...*
+**Recommendation:** Continue with current qwen-tts implementation and re-evaluate MLX-audio after:
+1. Python 3.14 compatibility is resolved
+2. Official Qwen3-TTS support is added
+3. Documentation and examples are available
+---
+**Generated:** 2026-02-06
+**Evaluator:** DAIV
+**Project:** voice-server