PyPI - converse-framework - Versions diffs - 0.2.0__tar.gz → 0.2.3__tar.gz - Mend

converse-framework 0.2.0tar.gz → 0.2.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (57) hide show

{converse_framework-0.2.0 → converse_framework-0.2.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: converse-framework
-Version: 0.2.0
+Version: 0.2.3
 Summary: Provider-agnostic speech stack for speech-to-speech applications
 License: MIT
 License-File: LICENSE
@@ -22,12 +22,15 @@ Requires-Dist: nvidia-cublas-cu12; (platform_system == 'Windows') and extra == '
 Provides-Extra: all-llm
 Requires-Dist: httpx>=0.28; extra == 'all-llm'
 Provides-Extra: all-tts
+Requires-Dist: httpx>=0.28; extra == 'all-tts'
 Requires-Dist: kokoro-onnx>=0.5; (python_version < '3.14') and extra == 'all-tts'
 Requires-Dist: misaki>=0.7; extra == 'all-tts'
 Requires-Dist: pocket-tts>=2.1; extra == 'all-tts'
 Provides-Extra: all-vad
 Requires-Dist: onnxruntime>=1.20; extra == 'all-vad'
 Requires-Dist: silero-vad>=6.0; extra == 'all-vad'
+Provides-Extra: audio-cpp
+Requires-Dist: httpx>=0.28; extra == 'audio-cpp'
 Provides-Extra: faster-whisper
 Requires-Dist: faster-whisper>=1.2; extra == 'faster-whisper'
 Requires-Dist: nvidia-cublas-cu12; (platform_system == 'Windows') and extra == 'faster-whisper'
@@ -49,6 +52,38 @@ Description-Content-Type: text/markdown
 Provider-agnostic speech stack for speech-to-speech applications.
+## Table of Contents
+- [Install](#install)
+  - [Missing dependency behavior](#missing-dependency-behavior)
+  - [Python version compatibility](#python-version-compatibility)
+- [Quick Start](#quick-start)
+  - [Provider status semantics](#provider-status-semantics)
+- [Recipes](#recipes)
+  - [Minimal mock text pipeline](#minimal-mock-text-pipeline)
+  - [Audio frame to utterance collector to pipeline](#audio-frame-to-utterance-collector-to-pipeline)
+  - [Custom provider registration](#custom-provider-registration)
+  - [Custom event sink](#custom-event-sink)
+  - [Browser playback](#browser-playback-js-reference-client)
+  - [Browser microphone capture](#browser-microphone-capture-js-reference-client)
+  - [Mobile browser microphone testing](#mobile-browser-microphone-testing)
+  - [Wrap an external CLI as a provider](#wrap-an-external-cli-as-a-provider)
+  - [Pocket TTS voice listing and configuration](#pocket-tts-voice-listing-and-configuration)
+  - [CUDA DLL helper](#cuda-dll-helper-windows)
+- [Runtime Provider Updates](#runtime-provider-updates)
+  - [ProviderBundle.replace()](#providerbundlereplace)
+  - [ProviderBundle.unload_replaced()](#providerbundleunload_replaced)
+  - [SpeechPipeline.update_providers()](#speechpipelineupdate_providers)
+  - [AudioUtteranceCollector.update_vad_provider()](#audioutterancecollectorupdate_vad_provider)
+  - [End-to-end pattern](#end-to-end-pattern)
+- [WebSocket Session Helper](#websocket-session-helper)
+- [Examples](#examples)
+  - [Text chat](#text-chat-automated-test-covered)
+  - [Voice chat](#voice-chat-manual)
+- [Framework / App Boundary](#framework--app-boundary)
+  - [Transport boundary](#transport-boundary)
+- [Status](#status)
 ## Install
 ```bash
@@ -62,6 +97,7 @@ providers live behind optional extras:
 pip install converse-framework[silero]          # Silero VAD
 pip install converse-framework[faster-whisper]  # faster-whisper ASR
 pip install converse-framework[whisper-cpp]     # whisper.cpp HTTP ASR
+pip install converse-framework[audio-cpp]       # audio.cpp HTTP ASR + TTS
 pip install converse-framework[llamacpp]        # llama.cpp HTTP LLM
 pip install converse-framework[kokoro]          # Kokoro ONNX TTS
 pip install converse-framework[pocket-tts]      # Pocket TTS
@@ -115,12 +151,13 @@ own constraints (the table below mirrors the markers in
 | `faster-whisper` | 3.11+ | The `nvidia-cublas-cu12` wheel pins Windows. |
 | `llamacpp` | 3.11+ | `httpx` itself supports 3.9+, so 3.11+ is the only constraint. |
 | `whisper-cpp` | 3.11+ | Only needs `httpx`, which supports 3.9+. |
+| `audio-cpp` | 3.11+ | Only needs `httpx`. Talks to a user-managed `audiocpp_server`. |
 | `kokoro` | 3.11 to <3.14 | `kokoro-onnx` 0.5.0 requires Python <3.14. The wheel build fails fast on 3.14+. |
 | `pocket-tts` | 3.11+ | No known upper bound. |
 The `kokoro` extra is the only one with an upper-bound marker today.
-If you are on Python 3.14+ and need a TTS provider, use `pocket-tts`
-or a mock provider. New providers should add their own
+If you are on Python 3.14+ and need a TTS provider, use `pocket-tts`,
+`audio-cpp`, or a mock provider. New providers should add their own
 `python_version` markers in `pyproject.toml` when their backend has a
 known limit.
@@ -644,7 +681,7 @@ for v in voices:
 Change voice (clears only the voice cache, preserves the loaded model):
 ```python
-result = provider.configure(voice="galileo")
+result = provider.configure(voice="anna")
 print(result.changed, result.requires_reload)
 # True, False — model stays, voice state reloaded
 ```

{converse_framework-0.2.0 → converse_framework-0.2.3}/README.md RENAMED Viewed

@@ -2,6 +2,38 @@
 Provider-agnostic speech stack for speech-to-speech applications.
+## Table of Contents
+- [Install](#install)
+  - [Missing dependency behavior](#missing-dependency-behavior)
+  - [Python version compatibility](#python-version-compatibility)
+- [Quick Start](#quick-start)
+  - [Provider status semantics](#provider-status-semantics)
+- [Recipes](#recipes)
+  - [Minimal mock text pipeline](#minimal-mock-text-pipeline)
+  - [Audio frame to utterance collector to pipeline](#audio-frame-to-utterance-collector-to-pipeline)
+  - [Custom provider registration](#custom-provider-registration)
+  - [Custom event sink](#custom-event-sink)
+  - [Browser playback](#browser-playback-js-reference-client)
+  - [Browser microphone capture](#browser-microphone-capture-js-reference-client)
+  - [Mobile browser microphone testing](#mobile-browser-microphone-testing)
+  - [Wrap an external CLI as a provider](#wrap-an-external-cli-as-a-provider)
+  - [Pocket TTS voice listing and configuration](#pocket-tts-voice-listing-and-configuration)
+  - [CUDA DLL helper](#cuda-dll-helper-windows)
+- [Runtime Provider Updates](#runtime-provider-updates)
+  - [ProviderBundle.replace()](#providerbundlereplace)
+  - [ProviderBundle.unload_replaced()](#providerbundleunload_replaced)
+  - [SpeechPipeline.update_providers()](#speechpipelineupdate_providers)
+  - [AudioUtteranceCollector.update_vad_provider()](#audioutterancecollectorupdate_vad_provider)
+  - [End-to-end pattern](#end-to-end-pattern)
+- [WebSocket Session Helper](#websocket-session-helper)
+- [Examples](#examples)
+  - [Text chat](#text-chat-automated-test-covered)
+  - [Voice chat](#voice-chat-manual)
+- [Framework / App Boundary](#framework--app-boundary)
+  - [Transport boundary](#transport-boundary)
+- [Status](#status)
 ## Install
 ```bash
@@ -15,6 +47,7 @@ providers live behind optional extras:
 pip install converse-framework[silero]          # Silero VAD
 pip install converse-framework[faster-whisper]  # faster-whisper ASR
 pip install converse-framework[whisper-cpp]     # whisper.cpp HTTP ASR
+pip install converse-framework[audio-cpp]       # audio.cpp HTTP ASR + TTS
 pip install converse-framework[llamacpp]        # llama.cpp HTTP LLM
 pip install converse-framework[kokoro]          # Kokoro ONNX TTS
 pip install converse-framework[pocket-tts]      # Pocket TTS
@@ -68,12 +101,13 @@ own constraints (the table below mirrors the markers in
 | `faster-whisper` | 3.11+ | The `nvidia-cublas-cu12` wheel pins Windows. |
 | `llamacpp` | 3.11+ | `httpx` itself supports 3.9+, so 3.11+ is the only constraint. |
 | `whisper-cpp` | 3.11+ | Only needs `httpx`, which supports 3.9+. |
+| `audio-cpp` | 3.11+ | Only needs `httpx`. Talks to a user-managed `audiocpp_server`. |
 | `kokoro` | 3.11 to <3.14 | `kokoro-onnx` 0.5.0 requires Python <3.14. The wheel build fails fast on 3.14+. |
 | `pocket-tts` | 3.11+ | No known upper bound. |
 The `kokoro` extra is the only one with an upper-bound marker today.
-If you are on Python 3.14+ and need a TTS provider, use `pocket-tts`
-or a mock provider. New providers should add their own
+If you are on Python 3.14+ and need a TTS provider, use `pocket-tts`,
+`audio-cpp`, or a mock provider. New providers should add their own
 `python_version` markers in `pyproject.toml` when their backend has a
 known limit.
@@ -597,7 +631,7 @@ for v in voices:
 Change voice (clears only the voice cache, preserves the loaded model):
 ```python
-result = provider.configure(voice="galileo")
+result = provider.configure(voice="anna")
 print(result.changed, result.requires_reload)
 # True, False — model stays, voice state reloaded
 ```

{converse_framework-0.2.0 → converse_framework-0.2.3}/converse_framework/audio_utils.py RENAMED Viewed

@@ -175,6 +175,53 @@ def float_audio_to_wav_bytes(audio, sample_rate: int) -> bytes:
     return buffer.getvalue()
+def wav_bytes_to_pcm_s16le(
+    wav_bytes: bytes,
+) -> tuple[bytes, int, int]:
+    """Decode a WAV byte string back into raw PCM s16le bytes and shape.
+    The inverse of :func:`float_audio_to_wav_bytes`: this reads a
+    complete 16-bit PCM WAV file from ``bytes`` and returns the raw
+    signed little-endian PCM body along with the sample rate and channel
+    count declared in the header. Providers that fetch WAV audio from
+    an HTTP backend (e.g. the audio.cpp ``/v1/audio/speech`` endpoint)
+    use this to turn the response into a wire-ready
+    :class:`~converse_framework.protocols.AudioChunk`.
+    Args:
+        wav_bytes: A complete WAV file as ``bytes`` (RIFF header + data).
+    Returns:
+        A ``(pcm_s16le, sample_rate, channels)`` tuple. ``pcm_s16le`` is
+        the raw 16-bit signed LE PCM bytes with no header;
+        ``sample_rate`` and ``channels`` come from the ``fmt `` chunk.
+        An empty or non-WAV input returns ``(b"", 0, 0)`` rather than
+        raising so callers can treat a failed decode as "no audio".
+    Raises:
+        ValueError: If ``wav_bytes`` starts like a WAV stream but cannot
+            be parsed, or if it is not 16-bit PCM audio.
+    """
+    if not wav_bytes or wav_bytes[:4] != b"RIFF":
+        return b"", 0, 0
+    buffer = BytesIO(wav_bytes)
+    try:
+        with wave.open(buffer, "rb") as wav:
+            sample_rate = wav.getframerate()
+            channels = wav.getnchannels()
+            sample_width = wav.getsampwidth()
+            if wav.getcomptype() != "NONE" or sample_width != 2:
+                bits = sample_width * 8
+                raise ValueError(
+                    f"unsupported WAV format: expected 16-bit PCM, got {bits}-bit "
+                    f"{wav.getcompname()}"
+                )
+            pcm = wav.readframes(wav.getnframes())
+    except wave.Error as exc:
+        raise ValueError(f"invalid WAV stream: {exc}") from exc
+    return pcm, sample_rate, channels
 def float_audio_to_pcm_s16le_bytes(audio) -> bytes:
     """Encode a float audio buffer as raw 16-bit signed LE PCM bytes.

{converse_framework-0.2.0 → converse_framework-0.2.3}/converse_framework/providers/__init__.py RENAMED Viewed

@@ -2,9 +2,10 @@
 Mock and unavailable providers are imported eagerly because they have no
 heavy dependencies. The concrete providers (``silero``, ``faster-whisper``,
-``llamacpp``, ``kokoro-onnx``, ``pocket-tts``) are not imported here --
-they are registered with :func:`converse_framework.registry.register_provider`
-by import string and loaded lazily on first use.
+``whisper-cpp``, ``llamacpp``, ``kokoro-onnx``, ``pocket-tts``,
+``audio-cpp``) are not imported here -- they are registered with
+:func:`converse_framework.registry.register_provider` by import string
+and loaded lazily on first use.
 """
 from converse_framework.providers.mock import (

converse-framework 0.2.0__tar.gz → 0.2.3__tar.gz

converse-framework 0.2.0tar.gz → 0.2.3tar.gz