PyPI - wyoming-microsoft-stt - Versions diffs - 1.3.3__tar.gz - Mend

wyoming-microsoft-stt 1.3.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

wyoming_microsoft_stt-1.3.3/PKG-INFO ADDED Viewed

@@ -0,0 +1,94 @@
+Metadata-Version: 2.4
+Name: wyoming-microsoft-stt
+Version: 1.3.3
+Summary: Add your description here
+Home-page: https://github.com/hugobloem/wyoming-microsoft-stt
+Author: Hugo Bloem
+Author-email:
+Requires-Python: >=3.13
+Description-Content-Type: text/markdown
+Requires-Dist: azure-cognitiveservices-speech>=1.45.0
+Requires-Dist: pydantic>=2.11.7
+Requires-Dist: wyoming>=1.7.2
+Dynamic: author
+Dynamic: home-page
+# Wyoming Microsoft STT
+Wyoming protocol server for Microsoft Azure speech-to-text.
+This Python package provides a Wyoming integration for Microsoft Azure speech-to-text and can be directly used with [Home Assistant](https://www.home-assistant.io/) voice and [Rhasspy](https://github.com/rhasspy/rhasspy3).
+## Azure Speech Service
+This program uses [Microsoft Azure Speech Service](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/). You can sign up to a free Azure account which comes with free tier of 5 audio hours per month, this should be enough for running a voice assistant as each command is relatively short. Once this amount is exceeded Azure could charge you for each second used (Current pricing is $0.36 per audio hour). I am not responsible for any incurred charges and recommend you set up a spending limit to reduce your exposure. However, for normal usage the free tier could suffice and the resource should not switch to a paid service automatically.
+If you have not set up a speech resource, you can follow the instructions below. (you only need to do this once and works both for [Speech-to-Text](https://github.com/hugobloem/wyoming-microsoft-stt) and [Text-to-Speech](https://github.com/hugobloem/wyoming-microsoft-tts))
+1. Sign in or create an account on [portal.azure.com](https://portal.azure.com).
+2. Create a subscription by searching for `subscription` in the search bar. [Consult Microsoft Learn for more information](https://learn.microsoft.com/en-gb/azure/cost-management-billing/manage/create-subscription#create-a-subscription-in-the-azure-portal).
+3. Create a speech resource by searching for `speech service`.
+4. Select the subscription you created, pick or create a resource group, select a region, pick an identifiable name, and select the pricing tier (you probably want Free F0)
+5. Once created, copy one of the keys from the speech service page. You will need this to run this program.
+## Usage
+Depending on the installation method parameters are parsed differently. However, the same options are used for each of the installation methods and can be found in the table below. Your service region and subscription key can be found on the speech service resource page (step 5 the Azure Speech service instructions).
+For the bare-metal Python install the program is run as follows:
+```python
+python -m wyoming-microsoft-stt --<key> <value>
+```
+| Key | Optional | Description |
+|---|---|---|
+| `service-region` | No | Azure service region e.g., `uksouth` |
+| `subscription-key` | No | Azure subscription key |
+| `language` | Yes | Default language to set for transcription, default: `en-GB`. For auto-detection provide multiple languages. |
+| `uri` | No | Uri where the server will be broadcasted e.g., `tcp://0.0.0.0:10300` |
+| `download-dir` | Yes | Directory to download models into (default: ) |
+| `update-languages` | Yes | Download latest languages.json during startup |
+| `debug` | Yes | Log debug messages |
+## Multi-language support
+This add-on can also auto-detect the spoken language from a list of pre-defined languages (max. 10). To do this in Home Assistant provide the languages separated by semi-colons like so:
+<img width="689" alt="Screenshot 2025-05-04 at 11 59 55" src="https://github.com/user-attachments/assets/b3c54fe5-ebf3-404a-a8e8-b0d27efaf76d" />
+> [!NOTE]
+> Setting multiple languages will override the options set by Home Assistant's Voice configuration! It will prompt you to select a language but the option is ignored when speech is processed.
+## Installation
+Depending on your use case there are different installation options.
+- **Using pip**
+  Clone the repository and install the package using pip. Please note the platform requirements as noted [here](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/quickstarts/setup-platform?tabs=linux%2Cubuntu%2Cdotnetcli%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cmac%2Cpypi&pivots=programming-language-python#platform-requirements).
+  ```sh
+  pip install .
+  ```
+- **Home Assistant Add-On**
+  Add the following repository as an add-on repository to your Home Assistant, or click the button below.
+  [https://github.com/hugobloem/homeassistant-addons](https://github.com/hugobloem/homeassistant-addons)
+  [![Open your Home Assistant instance and show the add add-on repository dialog with a specific repository URL pre-filled.](https://my.home-assistant.io/badges/supervisor_add_addon_repository.svg)](https://my.home-assistant.io/redirect/supervisor_add_addon_repository/?repository_url=https%3A%2F%2Fgithub.com%2Fhugobloem%2Fhomeassistant-addons)
+- **Docker container**
+  To run as a Docker container use the following command:
+  ```bash
+  docker run ghcr.io/hugobloem/wyoming-microsoft-stt-noha:latest --<key> <value>
+  ```
+  For the relevant keys please look at [the table below](#usage)
+- **docker compose**
+  Below is a sample for a docker compose file. The azure region + subscription key can be set in environment variables. Everything else needs to be passed via command line arguments.
+  ```yaml
+  wyoming-proxy-azure-stt:
+    image: ghcr.io/hugobloem/wyoming-microsoft-stt-noha
+    container_name: wyoming-azure-stt
+    ports:
+      - "10300:10300"
+    environment:
+      AZURE_SERVICE_REGION: swedencentral
+      AZURE_SUBSCRIPTION_KEY: XXX
+    command: --language=en-GB,nl-NL --uri=tcp://0.0.0.0:10300
+  ```

wyoming_microsoft_stt-1.3.3/README.md ADDED Viewed

@@ -0,0 +1,79 @@
+# Wyoming Microsoft STT
+Wyoming protocol server for Microsoft Azure speech-to-text.
+This Python package provides a Wyoming integration for Microsoft Azure speech-to-text and can be directly used with [Home Assistant](https://www.home-assistant.io/) voice and [Rhasspy](https://github.com/rhasspy/rhasspy3).
+## Azure Speech Service
+This program uses [Microsoft Azure Speech Service](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/). You can sign up to a free Azure account which comes with free tier of 5 audio hours per month, this should be enough for running a voice assistant as each command is relatively short. Once this amount is exceeded Azure could charge you for each second used (Current pricing is $0.36 per audio hour). I am not responsible for any incurred charges and recommend you set up a spending limit to reduce your exposure. However, for normal usage the free tier could suffice and the resource should not switch to a paid service automatically.
+If you have not set up a speech resource, you can follow the instructions below. (you only need to do this once and works both for [Speech-to-Text](https://github.com/hugobloem/wyoming-microsoft-stt) and [Text-to-Speech](https://github.com/hugobloem/wyoming-microsoft-tts))
+1. Sign in or create an account on [portal.azure.com](https://portal.azure.com).
+2. Create a subscription by searching for `subscription` in the search bar. [Consult Microsoft Learn for more information](https://learn.microsoft.com/en-gb/azure/cost-management-billing/manage/create-subscription#create-a-subscription-in-the-azure-portal).
+3. Create a speech resource by searching for `speech service`.
+4. Select the subscription you created, pick or create a resource group, select a region, pick an identifiable name, and select the pricing tier (you probably want Free F0)
+5. Once created, copy one of the keys from the speech service page. You will need this to run this program.
+## Usage
+Depending on the installation method parameters are parsed differently. However, the same options are used for each of the installation methods and can be found in the table below. Your service region and subscription key can be found on the speech service resource page (step 5 the Azure Speech service instructions).
+For the bare-metal Python install the program is run as follows:
+```python
+python -m wyoming-microsoft-stt --<key> <value>
+```
+| Key | Optional | Description |
+|---|---|---|
+| `service-region` | No | Azure service region e.g., `uksouth` |
+| `subscription-key` | No | Azure subscription key |
+| `language` | Yes | Default language to set for transcription, default: `en-GB`. For auto-detection provide multiple languages. |
+| `uri` | No | Uri where the server will be broadcasted e.g., `tcp://0.0.0.0:10300` |
+| `download-dir` | Yes | Directory to download models into (default: ) |
+| `update-languages` | Yes | Download latest languages.json during startup |
+| `debug` | Yes | Log debug messages |
+## Multi-language support
+This add-on can also auto-detect the spoken language from a list of pre-defined languages (max. 10). To do this in Home Assistant provide the languages separated by semi-colons like so:
+<img width="689" alt="Screenshot 2025-05-04 at 11 59 55" src="https://github.com/user-attachments/assets/b3c54fe5-ebf3-404a-a8e8-b0d27efaf76d" />
+> [!NOTE]
+> Setting multiple languages will override the options set by Home Assistant's Voice configuration! It will prompt you to select a language but the option is ignored when speech is processed.
+## Installation
+Depending on your use case there are different installation options.
+- **Using pip**
+  Clone the repository and install the package using pip. Please note the platform requirements as noted [here](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/quickstarts/setup-platform?tabs=linux%2Cubuntu%2Cdotnetcli%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cmac%2Cpypi&pivots=programming-language-python#platform-requirements).
+  ```sh
+  pip install .
+  ```
+- **Home Assistant Add-On**
+  Add the following repository as an add-on repository to your Home Assistant, or click the button below.
+  [https://github.com/hugobloem/homeassistant-addons](https://github.com/hugobloem/homeassistant-addons)
+  [![Open your Home Assistant instance and show the add add-on repository dialog with a specific repository URL pre-filled.](https://my.home-assistant.io/badges/supervisor_add_addon_repository.svg)](https://my.home-assistant.io/redirect/supervisor_add_addon_repository/?repository_url=https%3A%2F%2Fgithub.com%2Fhugobloem%2Fhomeassistant-addons)
+- **Docker container**
+  To run as a Docker container use the following command:
+  ```bash
+  docker run ghcr.io/hugobloem/wyoming-microsoft-stt-noha:latest --<key> <value>
+  ```
+  For the relevant keys please look at [the table below](#usage)
+- **docker compose**
+  Below is a sample for a docker compose file. The azure region + subscription key can be set in environment variables. Everything else needs to be passed via command line arguments.
+  ```yaml
+  wyoming-proxy-azure-stt:
+    image: ghcr.io/hugobloem/wyoming-microsoft-stt-noha
+    container_name: wyoming-azure-stt
+    ports:
+      - "10300:10300"
+    environment:
+      AZURE_SERVICE_REGION: swedencentral
+      AZURE_SUBSCRIPTION_KEY: XXX
+    command: --language=en-GB,nl-NL --uri=tcp://0.0.0.0:10300
+  ```

wyoming_microsoft_stt-1.3.3/pyproject.toml ADDED Viewed

@@ -0,0 +1,64 @@
+[project]
+name = "wyoming-microsoft-stt"
+version = "1.3.3"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.13"
+dependencies = [
+    "azure-cognitiveservices-speech>=1.45.0",
+    "pydantic>=2.11.7",
+    "wyoming>=1.7.2",
+]
+[dependency-groups]
+dev = [
+    "pytest>=8.4.1",
+    "pytest-asyncio>=1.1.0",
+    "ruff>=0.12.10",
+]
+[tool.ruff]
+lint.select = [
+    "B007", # Loop control variable {name} not used within loop body
+    "B014", # Exception handler with duplicate exception
+    "C",  # complexity
+    "D",  # docstrings
+    "E",  # pycodestyle
+    "F",  # pyflakes/autoflake
+    "ICN001", # import concentions; {name} should be imported as {asname}
+    "PGH004",  # Use specific rule codes when using noqa
+    "PLC0414", # Useless import alias. Import alias does not rename original package.
+    "SIM105", # Use contextlib.suppress({exception}) instead of try-except-pass
+    "SIM117", # Merge with-statements that use the same scope
+    "SIM118", # Use {key} in {dict} instead of {key} in {dict}.keys()
+    "SIM201", # Use {left} != {right} instead of not {left} == {right}
+    "SIM212", # Use {a} if {a} else {b} instead of {b} if not {a} else {a}
+    "SIM300", # Yoda conditions. Use 'age == 42' instead of '42 == age'.
+    "SIM401", # Use get from dict with default instead of an if block
+    "T20",  # flake8-print
+    "TRY004", # Prefer TypeError exception for invalid type
+    "RUF006", # Store a reference to the return value of asyncio.create_task
+    "UP",  # pyupgrade
+    "W",  # pycodestyle
+]
+lint.ignore = [
+    "D202",  # No blank lines allowed after function docstring
+    "D203",  # 1 blank line required before class docstring
+    "D213",  # Multi-line docstring summary should start at the second line
+    "D404",  # First word of the docstring should not be This
+    "D406",  # Section name should end with a newline
+    "D407",  # Section name underlining
+    "D411",  # Missing blank line before section
+    "E501",  # line too long
+    "E731",  # do not assign a lambda expression, use a def
+]
+[lint.flake8-pytest-style]
+fixture-parentheses = false
+[lint.pyupgrade]
+keep-runtime-typing = true
+[lint.mccabe]
+max-complexity = 25

wyoming_microsoft_stt-1.3.3/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

wyoming_microsoft_stt-1.3.3/setup.py ADDED Viewed

@@ -0,0 +1,44 @@
+from pathlib import Path  # noqa: D100
+import setuptools
+from setuptools import setup
+this_dir = Path(__file__).parent
+module_dir = this_dir / "wyoming_microsoft_stt"
+requirements = []
+requirements_path = this_dir / "requirements.txt"
+if requirements_path.is_file():
+    with open(requirements_path, encoding="utf-8") as requirements_file:
+        requirements = requirements_file.read().splitlines()
+data_files = [module_dir / "languages.json"]
+# -----------------------------------------------------------------------------
+setup(
+    name="wyoming_microsoft_stt",
+    version="1.3.3",
+    description="Wyoming Server for Microsoft STT",
+    url="https://github.com/hugobloem/wyoming-microsoft-stt",
+    author="Hugo Bloem",
+    author_email="",
+    license="MIT",
+    packages=setuptools.find_packages(),
+    package_data={
+        "wyoming_microsoft_stt": [str(p.relative_to(module_dir)) for p in data_files]
+    },
+    install_requires=requirements,
+    classifiers=[
+        "Development Status :: 3 - Alpha",
+        "Intended Audience :: Developers",
+        "Topic :: Text Processing :: Linguistic",
+        "License :: OSI Approved :: MIT License",
+        "Programming Language :: Python :: 3.7",
+        "Programming Language :: Python :: 3.8",
+        "Programming Language :: Python :: 3.9",
+        "Programming Language :: Python :: 3.10",
+        "Programming Language :: Python :: 3.11",
+    ],
+    keywords="rhasspy wyoming microsft stt",
+)

wyoming_microsoft_stt-1.3.3/tests/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ """Tests."""

wyoming_microsoft_stt-1.3.3/tests/conftest.py ADDED Viewed

@@ -0,0 +1,15 @@
+"""Fixtures for tests."""
+from wyoming_microsoft_stt import SpeechConfig
+import pytest
+import os
+@pytest.fixture
+def microsoft_stt_args():
+    """Return MicrosoftSTT instance."""
+    args = SpeechConfig(
+        subscription_key=os.environ.get("SPEECH_KEY"),
+        service_region=os.environ.get("SPEECH_REGION"),
+    )
+    return args

wyoming_microsoft_stt-1.3.3/tests/test_microsoft_stt.py ADDED Viewed

@@ -0,0 +1,19 @@
+"""Tests for the MicrosoftTTS class."""
+from wyoming_microsoft_stt.microsoft_stt import MicrosoftSTT
+def test_initialize(microsoft_stt_args):
+    """Test initialization."""
+    microsoft_stt = MicrosoftSTT(microsoft_stt_args)
+    assert microsoft_stt.speech_config is not None
+def test_set_profanity(microsoft_stt_args):
+    """Test set_profanity."""
+    microsoft_stt = MicrosoftSTT(microsoft_stt_args)
+    assert microsoft_stt.speech_config is not None
+    profanity = "masked"
+    microsoft_stt.set_profanity(profanity)
+    # There is currently no way to check the set profanity level

wyoming_microsoft_stt-1.3.3/tests/test_multilanguage.py ADDED Viewed

@@ -0,0 +1,109 @@
+"""Tests for the Microsoft STT service."""
+import asyncio
+import re
+import sys
+import os
+import wave
+from asyncio.subprocess import PIPE
+from pathlib import Path
+import pytest
+from wyoming.asr import Transcript
+from wyoming.audio import AudioStart, AudioStop, wav_to_chunks
+from wyoming.event import async_read_event, async_write_event
+from wyoming.info import Describe, Info
+import logging
+_LOGGER = logging.getLogger(__name__)
+_DIR = Path(__file__).parent
+_PROGRAM_DIR = _DIR.parent
+_LOCAL_DIR = _PROGRAM_DIR / "local"
+_SAMPLES_PER_CHUNK = 1024
+# Need to give time for the model to download
+_START_TIMEOUT = 60
+_TRANSCRIBE_TIMEOUT = 60
+@pytest.mark.asyncio
+async def test_multilanguage() -> None:
+    """Test the transcription."""
+    proc = await asyncio.create_subprocess_exec(
+        sys.executable,
+        "-m",
+        "wyoming_microsoft_stt",
+        "--uri",
+        "stdio://",
+        "--language",
+        "en-GB",
+        "nl-NL",
+        "--service-region",
+        os.environ.get("SPEECH_REGION"),
+        "--subscription-key",
+        os.environ.get("SPEECH_KEY"),
+        "--debug",
+        stdin=PIPE,
+        stdout=PIPE,
+    )
+    assert proc.stdin is not None
+    assert proc.stdout is not None
+    # Check info
+    await async_write_event(Describe().event(), proc.stdin)
+    while True:
+        event = await asyncio.wait_for(
+            async_read_event(proc.stdout), timeout=_START_TIMEOUT
+        )
+        assert event is not None
+        if not Info.is_type(event.type):
+            continue
+        info = Info.from_event(event)
+        assert len(info.asr) == 1, "Expected one asr service"
+        asr = info.asr[0]
+        assert len(asr.models) > 0, "Expected at least one model"
+        break
+    # Test known WAV
+    with wave.open(str(_DIR / "zet_het_licht_aan.wav"), "rb") as example_wav:
+        await async_write_event(
+            AudioStart(
+                rate=example_wav.getframerate(),
+                width=example_wav.getsampwidth(),
+                channels=example_wav.getnchannels(),
+            ).event(),
+            proc.stdin,
+        )
+        for chunk in wav_to_chunks(example_wav, _SAMPLES_PER_CHUNK):
+            await async_write_event(chunk.event(), proc.stdin)
+            _LOGGER.info("Sent bytes of audio data to the server")
+        await async_write_event(AudioStop().event(), proc.stdin)
+        _LOGGER.info("Sent audio stop event to the server")
+    while True:
+        event = await asyncio.wait_for(
+            async_read_event(proc.stdout), timeout=_TRANSCRIBE_TIMEOUT
+        )
+        assert event is not None
+        if not Transcript.is_type(event.type):
+            continue
+        transcript = Transcript.from_event(event)
+        _LOGGER.info(f"Received transcript: {transcript.text}")
+        text = transcript.text.lower().strip()
+        text = re.sub(r"[^a-z ]", "", text)
+        assert text == "zet het licht aan"
+        break
+    # Need to close stdin for graceful termination
+    proc.stdin.close()
+    _, stderr = await proc.communicate()
+    assert proc.returncode == 0, stderr.decode()

wyoming_microsoft_stt-1.3.3/tests/test_transcribe.py ADDED Viewed

@@ -0,0 +1,114 @@
+"""Tests for the Microsoft STT service."""
+import asyncio
+import re
+import sys
+import os
+import wave
+from asyncio.subprocess import PIPE
+from pathlib import Path
+import pytest
+from wyoming.asr import Transcript
+from wyoming.audio import AudioStart, AudioStop, wav_to_chunks
+from wyoming.event import async_read_event, async_write_event
+from wyoming.info import Describe, Info
+import logging
+_LOGGER = logging.getLogger(__name__)
+_DIR = Path(__file__).parent
+_PROGRAM_DIR = _DIR.parent
+_LOCAL_DIR = _PROGRAM_DIR / "local"
+_SAMPLES_PER_CHUNK = 1024
+# Need to give time for the model to download
+_START_TIMEOUT = 60
+_TRANSCRIBE_TIMEOUT = 60
+@pytest.mark.asyncio
+async def test_transcribe() -> None:
+    """Test the transcription."""
+    proc = await asyncio.create_subprocess_exec(
+        sys.executable,
+        "-m",
+        "wyoming_microsoft_stt",
+        "--uri",
+        "stdio://",
+        "--language",
+        "en-GB",
+        "--service-region",
+        os.environ.get("SPEECH_REGION"),
+        "--subscription-key",
+        os.environ.get("SPEECH_KEY"),
+        "--debug",
+        stdin=PIPE,
+        stdout=PIPE,
+    )
+    assert proc.stdin is not None
+    assert proc.stdout is not None
+    # Check info
+    await async_write_event(Describe().event(), proc.stdin)
+    while True:
+        event = await asyncio.wait_for(
+            async_read_event(proc.stdout), timeout=_START_TIMEOUT
+        )
+        assert event is not None
+        if not Info.is_type(event.type):
+            continue
+        info = Info.from_event(event)
+        assert len(info.asr) == 1, "Expected one asr service"
+        asr = info.asr[0]
+        assert len(asr.models) > 0, "Expected at least one model"
+        break
+    # Test known WAV
+    with wave.open(str(_DIR / "long_text.wav"), "rb") as example_wav:
+        await async_write_event(
+            AudioStart(
+                rate=example_wav.getframerate(),
+                width=example_wav.getsampwidth(),
+                channels=example_wav.getnchannels(),
+            ).event(),
+            proc.stdin,
+        )
+        for chunk in wav_to_chunks(example_wav, _SAMPLES_PER_CHUNK):
+            await async_write_event(chunk.event(), proc.stdin)
+            _LOGGER.info("Sent bytes of audio data to the server")
+        await async_write_event(AudioStop().event(), proc.stdin)
+        _LOGGER.info("Sent audio stop event to the server")
+    while True:
+        event = await asyncio.wait_for(
+            async_read_event(proc.stdout), timeout=_TRANSCRIBE_TIMEOUT
+        )
+        assert event is not None
+        if not Transcript.is_type(event.type):
+            continue
+        transcript = Transcript.from_event(event)
+        text = transcript.text.lower().strip()
+        text = re.sub(r"[^a-z ]", "", text)
+        _LOGGER.info(f"Received transcript: {text}")
+        original_text = "The Netherlands, informally Holland, is a country in Northwestern Europe with overseas territories in the Caribbean. It is the largest of the four constituent countries of the Kingdom of the Netherlands. The Netherlands consists of 12 provinces. It borders Germany to the east and Belgium to the south, with the North Sea coastline to the north and west. It shares maritime borders with the United Kingdom, Germany, and Belgium."
+        # Remove punctuation and convert to lowercase
+        original_text = original_text.lower()
+        original_text = re.sub(r"[^a-z ]", "", original_text)
+        assert text == original_text
+        break
+    # Need to close stdin for graceful termination
+    proc.stdin.close()
+    _, stderr = await proc.communicate()
+    assert proc.returncode == 0, stderr.decode()