npm - speechflow - Versions diffs - 1.7.0 → 2.0.0 - Mend

speechflow 1.7.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (169) hide show

package/README.md CHANGED Viewed

@@ -26,7 +26,8 @@ speech-to-speech).
 **SpeechFlow** comes with built-in graph nodes for various functionalities:
 - file and audio device I/O for local connectivity,
-- WebSocket and MQTT network I/O for remote connectivity,
+- WebSocket, MQTT, VBAN, and WebRTC network I/O for remote connectivity,
+- external command execution I/O for process integration,
 - local Voice Activity Detection (VAD),
 - local voice gender recognition,
 - local audio LUFS-S/RMS metering,
@@ -38,20 +39,27 @@ speech-to-speech).
 - remote-controlable audio muting,
 - cloud-based speech-to-text conversion with
   [Amazon Transcribe](https://aws.amazon.com/transcribe/),
-  [OpenAI GPT-Transcribe](https://platform.openai.com/docs/models/gpt-4o-mini-transcribe), or
-  [Deepgram](https://deepgram.com).
+  [OpenAI GPT-Transcribe](https://platform.openai.com/docs/models/gpt-4o-mini-transcribe),
+  [Deepgram](https://deepgram.com), or
+  [Google Cloud Speech-to-Text](https://cloud.google.com/speech-to-text).
 - cloud-based text-to-text translation (or spelling correction) with
   [DeepL](https://deepl.com),
   [Amazon Translate](https://aws.amazon.com/translate/),
-  [Google Cloud Translate](https://cloud.google.com/translate), or
-  [OpenAI GPT](https://openai.com).
+  [Google Cloud Translate](https://cloud.google.com/translate),
+  [OpenAI GPT](https://openai.com),
+  [Anthropic Claude](https://anthropic.com), or
+  [Google Gemini](https://ai.google.dev).
 - local text-to-text translation (or spelling correction) with
-  [Ollama/Gemma](https://ollama.com) or
-  [Transformers/OPUS](https://github.com/Helsinki-NLP/Opus-MT).
+  [Ollama](https://ollama.com) or
+  [OPUS-MT](https://github.com/Helsinki-NLP/Opus-MT).
 - cloud-based text-to-speech conversion with
-  [ElevenLabs](https://elevenlabs.io/) or
-  [Amazon Polly](https://aws.amazon.com/polly/).
-- local text-to-speech conversion with [Kokoro](https://github.com/nazdridoy/kokoro-tts).
+  [OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech),
+  [ElevenLabs](https://elevenlabs.io/),
+  [Amazon Polly](https://aws.amazon.com/polly/), or
+  [Google Cloud Text-to-Speech](https://cloud.google.com/text-to-speech).
+- local text-to-speech conversion with
+  [Kokoro](https://github.com/nazdridoy/kokoro-tts) or
+  [Supertonic](https://huggingface.co/Supertone/supertonic).
 - local [FFmpeg](https://ffmpeg.org/)-based speech-to-speech conversion,
 - local WAV speech-to-speech decoding/encoding,
 - local text-to-text formatting, regex-based modification,
@@ -221,8 +229,8 @@ They can also be found in the sample [speechflow.yaml](./etc/speechflow.yaml) fi
   ```
   xio-device(device: env.SPEECHFLOW_DEVICE_MIC, mode: "r") |
-      a2a-wav(mode: "encode") |
-          xio-file(path: "capture.wav", mode: "w", type: "audio")
+      a2a-wav(mode: "encode", seekable: true) |
+          xio-file(path: "capture.wav", mode: "w", type: "audio", seekable: true)
   ```
 - **Pass-Through**: Pass-through audio from microphone device to speaker
@@ -335,7 +343,10 @@ First a short overview of the available processing nodes:
   **xio-file**,
   **xio-device**,
   **xio-websocket**,
-  **xio-mqtt**.
+  **xio-mqtt**,
+  **xio-vban**,
+  **xio-webrtc**,
+  **xio-exec**.
 - Audio-to-Audio nodes:
   **a2a-ffmpeg**,
   **a2a-wav**,
@@ -353,22 +364,29 @@ First a short overview of the available processing nodes:
 - Audio-to-Text nodes:
   **a2t-openai**,
   **a2t-amazon**,
-  **a2t-deepgram**.
+  **a2t-deepgram**,
+  **a2t-google**.
 - Text-to-Text nodes:
   **t2t-deepl**,
   **t2t-amazon**,
-  **t2t-openai**,
-  **t2t-ollama**,
-  **t2t-transformers**,
+  **t2t-opus**,
   **t2t-google**,
+  **t2t-translate**,
+  **t2t-spellcheck**,
+  **t2t-punctuation**,
   **t2t-modify**,
+  **t2t-profanity**,
+  **t2t-summary**,
   **t2t-subtitle**,
   **t2t-format**,
   **t2t-sentence**.
 - Text-to-Audio nodes:
+  **t2a-openai**,
   **t2a-amazon**,
   **t2a-elevenlabs**,
-  **t2a-kokoro**.
+  **t2a-google**,
+  **t2a-kokoro**,
+  **t2a-supertonic**.
 - Any-to-Any nodes:
   **x2x-filter**,
   **x2x-trace**.
@@ -384,20 +402,24 @@ external files, devices and network services.
   > This node allows the reading/writing from/to files or from StdIO. It
   > is intended to be used as source and sink nodes in batch processing,
-  > and as sing nodes in real-time processing.
+  > and as sing nodes in real-time processing. When `seekable` is enabled
+  > for write mode, the node uses a file descriptor allowing random access
+  > writes to specific file positions via the `chunk:seek` metadata field.
+  > Option `seekable` cannot be used on StdIO.
   | Port    | Payload     |
   | ------- | ----------- |
   | input   | text, audio |
   | output  | text, audio |
-  | Parameter  | Position  | Default  | Requirement           |
-  | ---------- | --------- | -------- | --------------------- |
-  | **path**   | 0         | *none*   | *none*                |
-  | **mode**   | 1         | "r"      | `/^(?:r\|w\|rw)$/`    |
-  | **type**   | 2         | "audio"  | `/^(?:audio\|text)$/` |
-  | **chunka** |           | 200      | `10 <= n <= 1000`     |
-  | **chunkt** |           | 65536    | `1024 <= n <= 131072` |
+  | Parameter      | Position  | Default  | Requirement           |
+  | -------------- | --------- | -------- | --------------------- |
+  | **path**       | 0         | *none*   | *none*                |
+  | **mode**       | 1         | "r"      | `/^(?:r\|w)$/`        |
+  | **type**       | 2         | "audio"  | `/^(?:audio\|text)$/` |
+  | **seekable**   |           | false    | *none*                |
+  | **chunkAudio** |           | 200      | `10 <= n <= 1000`     |
+  | **chunkText**  |           | 65536    | `1024 <= n <= 131072` |
 - Node: **xio-device**<br/>
   Purpose: **Microphone/speaker device source/sink**<br/>
@@ -437,11 +459,12 @@ external files, devices and network services.
   | ----------- | --------- | -------- | --------------------- |
   | **listen**  | *none*    | *none*   | `/^(?:\|ws:\/\/(.+?):(\d+))$/` |
   | **connect** | *none*    | *none*   | `/^(?:\|ws:\/\/(.+?):(\d+)(?:\/.*)?)$/` |
-  | **type**    | *none*    | "audio"  | `/^(?:audio\|text)$/` |
+  | **mode**    | *none*    | "r"      | `/^(?:r\|w\|rw)$/`    |
+  | **type**    | *none*    | "text"   | `/^(?:audio\|text)$/` |
 - Node: **xio-mqtt**<br/>
-  Purpose: **MQTT sink**<br/>
-  Example: `xio-mqtt(url: "mqtt://127.0.0.1:1883", username: "foo", password: "bar", topic: "quux")`
+  Purpose: **MQTT source/sink**<br/>
+  Example: `xio-mqtt(url: "mqtt://127.0.0.1:1883", username: "foo", password: "bar", topicWrite: "quux")`
   Notice: this node requires a peer MQTT broker!
   > This node allows reading/writing from/to MQTT broker topics. It is
@@ -450,15 +473,94 @@ external files, devices and network services.
   | Port    | Payload     |
   | ------- | ----------- |
-  | input   | text        |
-  | output  | none        |
+  | input   | text, audio |
+  | output  | text, audio |
-  | Parameter    | Position  | Default  | Requirement           |
-  | ------------ | --------- | -------- | --------------------- |
-  | **url**      | 0         | *none*   | `/^(?:\|(?:ws\|mqtt):\/\/(.+?):(\d+))$/` |
-  | **username** | 1         | *none*   | `/^.+$/` |
-  | **password** | 2         | *none*   | `/^.+$/` |
-  | **topic**    | 3         | *none*   | `/^.+$/` |
+  | Parameter      | Position  | Default  | Requirement           |
+  | -------------- | --------- | -------- | --------------------- |
+  | **url**        | 0         | *none*   | `/^(?:\|(?:ws\|mqtt):\/\/(.+?):(\d+)(?:\/.*)?)$/` |
+  | **username**   | 1         | *none*   | `/^.+$/` |
+  | **password**   | 2         | *none*   | `/^.+$/` |
+  | **topicRead**  | 3         | *none*   | `/^.+$/` |
+  | **topicWrite** | 4         | *none*   | `/^.+$/` |
+  | **mode**       | 5         | "w"      | `/^(?:r\|w\|rw)$/` |
+  | **type**       | 6         | "text"   | `/^(?:audio\|text)$/` |
+- Node: **xio-vban**<br/>
+  Purpose: **VBAN network audio source/sink**<br/>
+  Example: `xio-vban(listen: 6980, stream: "Stream1", mode: "r")`
+  Notice: this node requires a peer VBAN-compatible application!
+  > This node allows reading/writing audio from/to VBAN (VoiceMeeter
+  > Audio Network) protocol endpoints. It is intended to be used for
+  > real-time audio streaming with applications like VoiceMeeter,
+  > VB-Audio Matrix, or other VBAN-compatible software. It supports
+  > various audio bit resolutions (8-bit, 16-bit, 24-bit, 32-bit,
+  > float32, float64) and automatic channel downmixing to mono.
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | audio       |
+  | output  | audio       |
+  | Parameter   | Position  | Default   | Requirement                  |
+  | ----------- | --------- | --------- | ---------------------------- |
+  | **listen**  | 0         | ""        | `/^(?:\|\d+\|.+?:\d+)$/`     |
+  | **connect** | 1         | ""        | `/^(?:\|.+?:\d+)$/`          |
+  | **stream**  | 2         | "Stream"  | `/^.{1,16}$/`                |
+  | **mode**    | 3         | "rw"      | `/^(?:r\|w\|rw)$/`           |
+- Node: **xio-webrtc**<br/>
+  Purpose: **WebRTC audio streaming source (WHIP) or sink (WHEP)**<br/>
+  Example: `xio-webrtc(listen: 8085, path: "/webrtc", mode: "r")`
+  > This node allows real-time audio streaming using WebRTC technology
+  > via WebRTC-HTTP Ingestion Protocol (WHIP) or WebRTC-HTTP Egress
+  > Protocol (WHEP). It provides an HTTP server for SDP negotiation
+  > and uses Opus codec for audio encoding/decoding at 48kHz. The node
+  > can operate in WHIP mode (i.e., read mode where publishers POST
+  > SDP offers to SpeechFlow and SpeechFlow receives audio stream from
+  > them) or WHEP mode (i.e., write mode where viewers POST SDP offers
+  > to SpeechFlow and SpeechFlow sends audio stream to them). This node
+  > supports multiple simultaneous connections, configurable ICE servers
+  > for NAT traversal, and automatic connection lifecycle management.
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | audio       |
+  | output  | audio       |
+  | Parameter      | Position  | Default   | Requirement                  |
+  | -------------- | --------- | --------- | ---------------------------- |
+  | **listen**     | 0         | "8085"    | `/^(?:\d+\|.+?:\d+)$/`       |
+  | **path**       | 1         | "/webrtc" | `/^\/.+$/`                   |
+  | **mode**       | 2         | "r"       | `/^(?:r\|w)$/`               |
+  | **iceServers** | 3         | ""        | `/^.*$/`                     |
+- Node: **xio-exec**<br/>
+  Purpose: **External command execution source/sink**<br/>
+  Example: `xio-exec(command: "ffmpeg -i - -f s16le -", mode: "rw", type: "audio")`
+  > This node allows reading/writing from/to external commands via stdin/stdout.
+  > It executes arbitrary commands and pipes audio or text data through them,
+  > enabling integration with external processing tools. The node supports
+  > read-only mode (capturing stdout), write-only mode (sending to stdin),
+  > and bidirectional mode (both stdin and stdout). This is useful for integrating
+  > external audio/text processing tools like FFmpeg, SoX, or custom scripts into
+  > the SpeechFlow pipeline.
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | text, audio |
+  | output  | text, audio |
+  | Parameter      | Position  | Default  | Requirement           |
+  | -------------- | --------- | -------- | --------------------- |
+  | **command**    | 0         | *none*   | *required*            |
+  | **mode**       | 1         | "r"      | `/^(?:r\|w\|rw)$/`    |
+  | **type**       | 2         | "audio"  | `/^(?:audio\|text)$/` |
+  | **chunkAudio** |           | 200      | `10 <= n <= 1000`     |
+  | **chunkText**  |           | 65536    | `1024 <= n <= 131072` |
 ### Audio-to-Audio Nodes
@@ -477,10 +579,10 @@ The following nodes process audio chunks only.
   | input   | audio       |
   | output  | audio       |
-  | Parameter   | Position  | Default  | Requirement        |
-  | ----------- | --------- | -------- | ------------------ |
-  | **src**     | 0         | "pcm"    | `/^(?:pcm\|wav\|mp3\|opus)$/` |
-  | **dst**     | 1         | "wav"    | `/^(?:pcm\|wav\|mp3\|opus)$/` |
+  | Parameter | Position  | Default  | Requirement        |
+  | --------- | --------- | -------- | ------------------ |
+  | **src**   | 0         | "pcm"    | `/^(?:pcm\|wav\|mp3\|opus)$/` |
+  | **dst**   | 1         | "wav"    | `/^(?:pcm\|wav\|mp3\|opus)$/` |
 - Node: **a2a-wav**<br/>
   Purpose: **WAV audio format conversion**<br/>
@@ -489,15 +591,20 @@ The following nodes process audio chunks only.
   > This node allows converting between PCM and WAV audio formats. It is
   > primarily intended to support the reading/writing of external WAV
   > format files, although SpeechFlow internally uses PCM format only.
+  > When `seekable` is enabled in encode mode, the node writes a corrected
+  > WAV header at the end of processing with accurate file size information
+  > by seeking back to position 0, producing standard-compliant WAV files.
+  > Option `seekable` requires a seekable output stream.
   | Port    | Payload     |
   | ------- | ----------- |
   | input   | audio       |
   | output  | audio       |
-  | Parameter   | Position  | Default  | Requirement              |
-  | ----------- | --------- | -------- | ------------------------ |
-  | **mode**    | 0         | "encode" | `/^(?:encode\|decode)$/` |
+  | Parameter    | Position  | Default  | Requirement              |
+  | ------------ | --------- | -------- | ------------------------ |
+  | **mode**     | 0         | "encode" | `/^(?:encode\|decode)$/` |
+  | **seekable** | 1         | false    | *none*                   |
 - Node: **a2a-mute**<br/>
   Purpose: **volume muting node**<br/>
@@ -512,8 +619,8 @@ The following nodes process audio chunks only.
   | input   | audio       |
   | output  | audio       |
-  | Parameter   | Position  | Default  | Requirement              |
-  | ----------- | --------- | -------- | ------------------------ |
+  | Parameter | Position  | Default  | Requirement              |
+  | --------- | --------- | -------- | ------------------------ |
 - Node: **a2a-meter**<br/>
   Purpose: **Loudness metering node**<br/>
@@ -531,7 +638,7 @@ The following nodes process audio chunks only.
   | Parameter     | Position  | Default  | Requirement            |
   | ------------- | --------- | -------- | ---------------------- |
-  | **interval**  | 0         | 250      | *none*                 |
+  | **interval**  | 0         | 100      | *none*                 |
   | **mode**      | 1         | "filter" | `/^(?:filter\|sink)$/` |
   | **dashboard** |           | *none*   | *none*                 |
@@ -548,8 +655,8 @@ The following nodes process audio chunks only.
   | input   | audio       |
   | output  | audio       |
-  | Parameter   | Position  | Default  | Requirement              |
-  | ----------- | --------- | -------- | ------------------------ |
+  | Parameter | Position  | Default  | Requirement              |
+  | --------- | --------- | -------- | ------------------------ |
   | **mode**               | *none* | "unplugged" | `/^(?:silenced\|unplugged)$/` |
   | **posSpeechThreshold** | *none* | 0.50  | *none* |
   | **negSpeechThreshold** | *none* | 0.35  | *none* |
@@ -571,11 +678,12 @@ The following nodes process audio chunks only.
   | input   | audio       |
   | output  | audio       |
-  | Parameter      | Position  | Default  | Requirement              |
-  | -------------- | --------- | -------- | ------------------------ |
-  | **window**     | 0         | 500      | *none*                   |
-  | **treshold**   | 1         | 0.50     | *none*                   |
-  | **hysteresis** | 2         | 0.25     | *none*                   |
+  | Parameter           | Position  | Default  | Requirement              |
+  | ------------------- | --------- | -------- | ------------------------ |
+  | **window**          | 0         | 500      | *none*                   |
+  | **threshold**       | 1         | 0.50     | *none*                   |
+  | **hysteresis**      | 2         | 0.25     | *none*                   |
+  | **volumeThreshold** | 3         | -45      | *none*                   |
 - Node: **a2a-speex**<br/>
   Purpose: **Speex Noise Suppression node**<br/>
@@ -590,9 +698,9 @@ The following nodes process audio chunks only.
   | input   | audio       |
   | output  | audio       |
-  | Parameter   | Position  | Default  | Requirement              |
-  | ----------- | --------- | -------- | ------------------------ |
-  | **attentuate** | 0 | -18  | *none* | `-60 <= n <= 0` |
+  | Parameter      | Position  | Default  | Requirement        |
+  | -------------- | --------- | -------- | ------------------ |
+  | **attentuate** | 0         | -18      | `-60 <= n <= 0`    |
 - Node: **a2a-rnnoise**<br/>
   Purpose: **RNNoise Noise Suppression node**<br/>
@@ -606,8 +714,8 @@ The following nodes process audio chunks only.
   | input   | audio       |
   | output  | audio       |
-  | Parameter   | Position  | Default  | Requirement              |
-  | ----------- | --------- | -------- | ------------------------ |
+  | Parameter | Position  | Default  | Requirement              |
+  | --------- | --------- | -------- | ------------------------ |
 - Node: **a2a-compressor**<br/>
   Purpose: **audio compressor node**<br/>
@@ -621,14 +729,17 @@ The following nodes process audio chunks only.
   | input   | audio       |
   | output  | audio       |
-  | Parameter   | Position  | Default  | Requirement              |
-  | ----------- | --------- | -------- | ------------------------ |
-  | **thresholdDb** | *none* | -18 | `n <= 0 && n >= -60` |
-  | **ratio**       | *none* | 4   | `n >= 1 && n <= 20`  |
-  | **attackMs**    | *none* | 10  | `n >= 0 && n <= 100` |
-  | **releaseMs**   | *none* | 50  | `n >= 0 && n <= 100` |
-  | **kneeDb**      | *none* | 6   | `n >= 0 && n <= 100` |
-  | **makeupDb**    | *none* | 0   | `n >= 0 && n <= 100` |
+  | Parameter       | Position  | Default      | Requirement              |
+  | --------------- | --------- | ------------ | ------------------------ |
+  | **type**        | *none*    | "standalone" | `/^(?:standalone\|sidechain)$/` |
+  | **mode**        | *none*    | "compress"   | `/^(?:compress\|measure\|adjust)$/` |
+  | **bus**         | *none*    | "compressor" | `/^.+$/`             |
+  | **thresholdDb** | *none*    | -23          | `n <= 0 && n >= -100`|
+  | **ratio**       | *none*    | 4.0          | `n >= 1 && n <= 20`  |
+  | **attackMs**    | *none*    | 10           | `n >= 0 && n <= 1000`|
+  | **releaseMs**   | *none*    | 50           | `n >= 0 && n <= 1000`|
+  | **kneeDb**      | *none*    | 6.0          | `n >= 0 && n <= 40`  |
+  | **makeupDb**    | *none*    | 0            | `n >= -24 && n <= 24`|
 - Node: **a2a-expander**<br/>
   Purpose: **audio expander node**<br/>
@@ -642,14 +753,15 @@ The following nodes process audio chunks only.
   | input   | audio       |
   | output  | audio       |
-  | Parameter   | Position  | Default  | Requirement              |
-  | ----------- | --------- | -------- | ------------------------ |
-  | **thresholdDb** | *none* | -45 | `n <= 0 && n >= -60` |
-  | **ratio**       | *none* | 4   | `n >= 1 && n <= 20`  |
-  | **attackMs**    | *none* | 10  | `n >= 0 && n <= 100` |
-  | **releaseMs**   | *none* | 50  | `n >= 0 && n <= 100` |
-  | **kneeDb**      | *none* | 6   | `n >= 0 && n <= 100` |
-  | **makeupDb**    | *none* | 0   | `n >= 0 && n <= 100` |
+  | Parameter       | Position  | Default  | Requirement           |
+  | --------------- | --------- | -------- | --------------------- |
+  | **thresholdDb** | *none*    | -45      | `n <= 0 && n >= -100` |
+  | **floorDb**     | *none*    | -64      | `n <= 0 && n >= -100` |
+  | **ratio**       | *none*    | 4.0      | `n >= 1 && n <= 20`   |
+  | **attackMs**    | *none*    | 10       | `n >= 0 && n <= 1000` |
+  | **releaseMs**   | *none*    | 50       | `n >= 0 && n <= 1000` |
+  | **kneeDb**      | *none*    | 6.0      | `n >= 0 && n <= 40`   |
+  | **makeupDb**    | *none*    | 0        | `n >= -24 && n <= 24` |
 - Node: **a2a-gain**<br/>
   Purpose: **audio gain adjustment node**<br/>
@@ -663,9 +775,9 @@ The following nodes process audio chunks only.
   | input   | audio       |
   | output  | audio       |
-  | Parameter   | Position  | Default  | Requirement              |
-  | ----------- | --------- | -------- | ------------------------ |
-  | **db** | *none* | 12 | `n >= -60 && n <= -60` |
+  | Parameter | Position  | Default  | Requirement           |
+  | --------- | --------- | -------- | --------------------- |
+  | **db**    | 0         | 0        | `n >= -60 && n <= 60` |
 - Node: **a2a-pitch**<br/>
   Purpose: **audio pitch shifting and time stretching**<br/>
@@ -701,8 +813,9 @@ The following nodes process audio chunks only.
   | input   | audio       |
   | output  | audio       |
-  | Parameter   | Position  | Default  | Requirement              |
-  | ----------- | --------- | -------- | ------------------------ |
+  | Parameter   | Position  | Default  | Requirement            |
+  | ----------- | --------- | -------- | ---------------------- |
+  | **segment** | 0         | 50       | `n >= 10 && n <= 1000` |
 ### Audio-to-Text Nodes
@@ -719,7 +832,7 @@ The following nodes convert audio to text chunks.
   | Port    | Payload     |
   | ------- | ----------- |
-  | input   | text        |
+  | input   | audio       |
   | output  | text        |
   | Parameter    | Position  | Default  | Requirement        |
@@ -770,9 +883,32 @@ The following nodes convert audio to text chunks.
   | ------------ | --------- | -------- | ------------------ |
   | **key**      | *none*    | env.SPEECHFLOW\_DEEPGRAM\_KEY | *none* |
   | **keyAdm**   | *none*    | env.SPEECHFLOW\_DEEPGRAM\_KEY\_ADM | *none* |
-  | **model**    | 0         | "nova-3" | *none* |
+  | **model**    | 0         | "nova-2" | *none* |
   | **version**  | 1         | "latest" | *none* |
   | **language** | 2         | "multi"  | *none* |
+  | **interim**  | 3         | false    | *none* |
+- Node: **a2t-google**<br/>
+  Purpose: **Google Cloud Speech-to-Text conversion**<br/>
+  Example: `a2t-google(language: "en-US")`<br/>
+  Notice: this node requires a Google Cloud API key!
+  > This node uses Google Cloud Speech-to-Text to perform Speech-to-Text (S2T)
+  > conversion, i.e., it recognizes speech in the input audio stream and
+  > outputs a corresponding text stream. It supports various languages
+  > and models, including the `latest_long` model for long-form audio.
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | audio       |
+  | output  | text        |
+  | Parameter    | Position  | Default       | Requirement  |
+  | ------------ | --------- | ------------- | ------------ |
+  | **key**      | *none*    | env.SPEECHFLOW\_GOOGLE\_KEY | *none* |
+  | **model**    | 0         | "latest_long" | *none*       |
+  | **language** | 1         | "en-US"       | *none*       |
+  | **interim**  | 2         | false         | *none*       |
 ### Text-to-Text Nodes
@@ -783,73 +919,65 @@ The following nodes process text chunks only.
   Example: `t2t-deepl(src: "de", dst: "en")`<br/>
   Notice: this node requires an API key!
-  > This node performs translation between English and German languages.
+  > This node performs translation between multiple languages.
   | Port    | Payload     |
   | ------- | ----------- |
   | input   | text        |
   | output  | text        |
-  | Parameter    | Position  | Default  | Requirement        |
-  | ------------ | --------- | -------- | ------------------ |
-  | **key**      | *none*    | env.SPEECHFLOW\_DEEPL\_KEY | *none* |
-  | **src**      | 0         | "de"     | `/^(?:de\|en)$/` |
-  | **dst**      | 1         | "en"     | `/^(?:de\|en)$/` |
+  | Parameter    | Position  | Default    | Requirement                   |
+  | ------------ | --------- | ---------- | ----------------------------- |
+  | **key**      | *none*    | env.SPEECHFLOW\_DEEPL\_KEY | *none*      |
+  | **src**      | 0         | "de"       | `/^(?:de\|en\|fr\|it)$/`      |
+  | **dst**      | 1         | "en"       | `/^(?:de\|en\|fr\|it)$/`      |
+  | **optimize** | 2         | "latency"  | `/^(?:latency\|quality)$/`    |
 - Node: **t2t-amazon**<br/>
   Purpose: **AWS Translate Text-to-Text translation**<br/>
   Example: `t2t-amazon(src: "de", dst: "en")`<br/>
   Notice: this node requires an API key!
-  > This node performs translation between English and German languages.
+  > This node performs translation between multiple languages.
   | Port    | Payload     |
   | ------- | ----------- |
   | input   | text        |
   | output  | text        |
-  | Parameter    | Position  | Default  | Requirement        |
-  | ------------ | --------- | -------- | ------------------ |
-  | **key**      | *none*    | env.SPEECHFLOW\_AMAZON\_KEY | *none* |
+  | Parameter    | Position  | Default  | Requirement                  |
+  | ------------ | --------- | -------- | ---------------------------- |
+  | **key**      | *none*    | env.SPEECHFLOW\_AMAZON\_KEY | *none*     |
   | **secKey**   | *none*    | env.SPEECHFLOW\_AMAZON\_KEY\_SEC | *none* |
-  | **region**   | *none*    | "eu-central-1" | *none* |
-  | **src**      | 0         | "de"     | `/^(?:de\|en)$/` |
-  | **dst**      | 1         | "en"     | `/^(?:de\|en)$/` |
+  | **region**   | *none*    | "eu-central-1" | *none*               |
+  | **src**      | 0         | "de"     | `/^(?:de\|en\|fr\|it)$/`     |
+  | **dst**      | 1         | "en"     | `/^(?:de\|en\|fr\|it)$/`     |
-- Node: **t2t-openai**<br/>
-  Purpose: **OpenAI/GPT Text-to-Text translation and spelling correction**<br/>
-  Example: `t2t-openai(src: "de", dst: "en")`<br/>
-  Notice: this node requires an OpenAI API key!
+- Node: **t2t-opus**<br/>
+  Purpose: **OPUS-MT Text-to-Text translation**<br/>
+  Example: `t2t-opus(src: "de", dst: "en")`<br/>
   > This node performs translation between English and German languages
-  > in the text stream or (if the source and destination language is
-  > the same) spellchecking of English or German languages in the text
-  > stream. It is based on the remote OpenAI cloud AI service and uses
-  > the GPT-4o-mini LLM.
+  > in the text stream. It is based on the local OPUS-MT translation model.
   | Port    | Payload     |
   | ------- | ----------- |
   | input   | text        |
   | output  | text        |
-  | Parameter    | Position  | Default  | Requirement        |
-  | ------------ | --------- | -------- | ------------------ |
-  | **api**      | *none*    | "https://api.openai.com" | `/^https?:\/\/.+?:\d+$/` |
+  | Parameter    | Position  | Default  | Requirement      |
+  | ------------ | --------- | -------- | ---------------- |
   | **src**      | 0         | "de"     | `/^(?:de\|en)$/` |
   | **dst**      | 1         | "en"     | `/^(?:de\|en)$/` |
-  | **key**      | *none*    | env.SPEECHFLOW\_OPENAI\_KEY | *none* |
-  | **model**    | *none*    | "gpt-5-mini" | *none* |
-- Node: **t2t-ollama**<br/>
-  Purpose: **Ollama/Gemma Text-to-Text translation and spelling correction**<br/>
-  Example: `t2t-ollama(src: "de", dst: "en")`<br/>
-  Notice: this node requires Ollama to be installed!
+- Node: **t2t-google**<br/>
+  Purpose: **Google Cloud Translate Text-to-Text translation**<br/>
+  Example: `t2t-google(src: "de", dst: "en")`<br/>
+  Notice: this node requires a Google Cloud API key and project ID!
-  > This node performs translation between English and German languages
-  > in the text stream or (if the source and destination language is
-  > the same) spellchecking of English or German languages in the text
-  > stream. It is based on the local Ollama AI service and uses the
-  > Google Gemma 3 LLM.
+  > This node performs translation between multiple languages
+  > in the text stream using Google Cloud Translate API.
+  > It supports German, English, French, and Italian languages.
   | Port    | Payload     |
   | ------- | ----------- |
@@ -858,48 +986,83 @@ The following nodes process text chunks only.
   | Parameter    | Position  | Default  | Requirement        |
   | ------------ | --------- | -------- | ------------------ |
-  | **api**      | *none*    | "http://127.0.0.1:11434" | `/^https?:\/\/.+?:\d+$/` |
-  | **model**    | *none*    | "gemma3:4b-it-q4_K_M" | *none* |
-  | **src**      | 0         | "de"     | `/^(?:de\|en)$/` |
-  | **dst**      | 1         | "en"     | `/^(?:de\|en)$/` |
+  | **key**      | *none*    | env.SPEECHFLOW\_GOOGLE\_KEY | *none* |
+  | **src**      | 0         | "de"     | `/^(?:de\|en\|fr\|it)$/` |
+  | **dst**      | 1         | "en"     | `/^(?:de\|en\|fr\|it)$/` |
-- Node: **t2t-transformers**<br/>
-  Purpose: **Transformers Text-to-Text translation**<br/>
-  Example: `t2t-transformers(src: "de", dst: "en")`<br/>
+- Node: **t2t-translate**<br/>
+  Purpose: **LLM-based Text-to-Text translation**<br/>
+  Example: `t2t-translate(src: "de", dst: "en")`<br/>
+  Notice: this node requires an LLM provider (Ollama by default, or cloud-based OpenAI/Anthropic/Google, or local HuggingFace Transformers)!
   > This node performs translation between English and German languages
-  > in the text stream. It is based on local OPUS or SmolLM3 LLMs.
+  > in the text stream using an LLM service. Multiple LLM providers are
+  > supported: local Ollama (default), local HuggingFace Transformers,
+  > or cloud-based OpenAI, Anthropic, or Google.
   | Port    | Payload     |
   | ------- | ----------- |
   | input   | text        |
   | output  | text        |
-  | Parameter    | Position  | Default  | Requirement      |
-  | ------------ | --------- | -------- | ---------------- |
-  | **model**    | *none*    | "OPUS"   | `/^(?:OPUS\|SmolLM3)$/` |
-  | **src**      | 0         | "de"     | `/^(?:de\|en)$/` |
-  | **dst**      | 1         | "en"     | `/^(?:de\|en)$/` |
+  | Parameter    | Position  | Default                  | Requirement                              |
+  | ------------ | --------- | ------------------------ | ---------------------------------------- |
+  | **src**      | 0         | "de"                     | `/^(?:de\|en)$/`                         |
+  | **dst**      | 1         | "en"                     | `/^(?:de\|en)$/`                         |
+  | **provider** | *none*    | "ollama"                 | `/^(?:openai\|anthropic\|google\|ollama\|transformers)$/` |
+  | **api**      | *none*    | "http://127.0.0.1:11434" | `/^https?:\/\/.+?(:\d+)?$/`              |
+  | **model**    | *none*    | "gemma3:4b-it-q4\_K\_M"  | *none*                                   |
+  | **key**      | *none*    | ""                       | *none*                                   |
+- Node: **t2t-spellcheck**<br/>
+  Purpose: **LLM-based Text-to-Text spellchecking**<br/>
+  Example: `t2t-spellcheck(lang: "en")`<br/>
+  Notice: this node requires an LLM provider (Ollama by default, or cloud-based OpenAI/Anthropic/Google, or local HuggingFace Transformers)!
+  > This node performs spellchecking of English or German text using an
+  > LLM service. It corrects spelling mistakes, adds missing punctuation,
+  > but preserves grammar and word choice. Multiple LLM providers are
+  > supported: local Ollama (default), local HuggingFace Transformers,
+  > or cloud-based OpenAI, Anthropic, or Google.
-- Node: **t2t-google**<br/>
-  Purpose: **Google Cloud Translate Text-to-Text translation**<br/>
-  Example: `t2t-google(src: "de", dst: "en")`<br/>
-  Notice: this node requires a Google Cloud API key and project ID!
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | text        |
+  | output  | text        |
-  > This node performs translation between multiple languages
-  > in the text stream using Google Cloud Translate API.
-  > It supports German, English, French, and Italian languages.
+  | Parameter    | Position  | Default                  | Requirement                              |
+  | ------------ | --------- | ------------------------ | ---------------------------------------- |
+  | **lang**     | 0         | "en"                     | `/^(?:en\|de)$/`                         |
+  | **provider** | *none*    | "ollama"                 | `/^(?:openai\|anthropic\|google\|ollama\|transformers)$/` |
+  | **api**      | *none*    | "http://127.0.0.1:11434" | `/^https?:\/\/.+?(:\d+)?$/`              |
+  | **model**    | *none*    | "gemma3:4b-it-q4\_K\_M"  | *none*                                   |
+  | **key**      | *none*    | ""                       | *none*                                   |
+- Node: **t2t-punctuation**<br/>
+  Purpose: **LLM-based punctuation restoration**<br/>
+  Example: `t2t-punctuation(lang: "en")`<br/>
+  Notice: this node requires an LLM provider (Ollama by default, or cloud-based OpenAI/Anthropic/Google, or local HuggingFace Transformers)!
+  > This node performs punctuation restoration using an LLM service.
+  > It adds missing punctuation marks (periods, commas, question marks,
+  > exclamation marks, colons, semicolons) and capitalizes the first
+  > letters of sentences. It preserves all original words exactly as they
+  > are without spelling corrections or grammar changes. Multiple LLM
+  > providers are supported: local Ollama (default), local HuggingFace
+  > Transformers, or cloud-based OpenAI, Anthropic, or Google.
   | Port    | Payload     |
   | ------- | ----------- |
   | input   | text        |
   | output  | text        |
-  | Parameter    | Position  | Default  | Requirement        |
-  | ------------ | --------- | -------- | ------------------ |
-  | **key**      | *none*    | env.SPEECHFLOW\_GOOGLE\_KEY | *none* |
-  | **src**      | 0         | "de"     | `/^(?:de\|en\|fr\|it)$/` |
-  | **dst**      | 1         | "en"     | `/^(?:de\|en\|fr\|it)$/` |
+  | Parameter    | Position  | Default                  | Requirement                              |
+  | ------------ | --------- | ------------------------ | ---------------------------------------- |
+  | **lang**     | 0         | "en"                     | `/^(?:en\|de)$/`                         |
+  | **provider** | *none*    | "ollama"                 | `/^(?:openai\|anthropic\|google\|ollama\|transformers)$/` |
+  | **api**      | *none*    | "http://127.0.0.1:11434" | `/^https?:\/\/.+?(:\d+)?$/`              |
+  | **model**    | *none*    | "gemma3:4b-it-q4\_K\_M"  | *none*                                   |
+  | **key**      | *none*    | ""                       | *none*                                   |
 - Node: **t2t-modify**<br/>
   Purpose: **regex-based text modification**<br/>
@@ -919,6 +1082,53 @@ The following nodes process text chunks only.
   | **match**    | 0         | ""       | *required*         |
   | **replace**  | 1         | ""       | *required*         |
+- Node: **t2t-profanity**<br/>
+  Purpose: **profanity filtering**<br/>
+  Example: `t2t-profanity(lang: "en", placeholder: "***")`<br/>
+  > This node filters profanity from the text stream by detecting bad words
+  > and replacing them with a placeholder. It supports English and German
+  > languages and can either replace with a fixed placeholder or repeat
+  > the placeholder character for each character of the detected word.
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | text        |
+  | output  | text        |
+  | Parameter       | Position  | Default    | Requirement              |
+  | --------------- | --------- | ---------- | ------------------------ |
+  | **lang**        | *none*    | "en"       | `/^(?:en\|de)$/`         |
+  | **placeholder** | *none*    | "\*\*\*"   | *none*                   |
+  | **mode**        | *none*    | "replace"  | `/^(?:replace\|repeat)$/`|
+- Node: **t2t-summary**<br/>
+  Purpose: **LLM-based Text-to-Text summarization**<br/>
+  Example: `t2t-summary(lang: "en", size: 4, trigger: 8)`<br/>
+  Notice: this node requires an LLM provider (Ollama by default, or cloud-based OpenAI/Anthropic/Google, or local HuggingFace Transformers)!
+  > This node performs text summarization using an LLM service.
+  > It accumulates incoming text sentences and generates a summary after
+  > a configurable number of sentences (trigger). The summary length is
+  > also configurable (size). It supports English and German languages.
+  > Multiple LLM providers are supported: local Ollama (default), local
+  > HuggingFace Transformers, or cloud-based OpenAI, Anthropic, or Google.
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | text        |
+  | output  | text        |
+  | Parameter    | Position  | Default                  | Requirement                              |
+  | ------------ | --------- | ------------------------ | ---------------------------------------- |
+  | **provider** | *none*    | "ollama"                 | `/^(?:openai\|anthropic\|google\|ollama\|transformers)$/` |
+  | **api**      | *none*    | "http://127.0.0.1:11434" | `/^https?:\/\/.+?(:\d+)?$/`              |
+  | **model**    | *none*    | "gemma3:4b-it-q4\_K\_M"  | *none*                                   |
+  | **key**      | *none*    | ""                       | *none*                                   |
+  | **lang**     | 0         | "en"                     | `/^(?:en\|de)$/`                         |
+  | **size**     | 1         | 4                        | `1 <= n <= 20`                           |
+  | **trigger**  | 2         | 8                        | `1 <= n <= 100`                          |
 - Node: **t2t-sentence**<br/>
   Purpose: **sentence splitting/merging**<br/>
   Example: `t2t-sentence()`<br/>
@@ -977,6 +1187,32 @@ The following nodes process text chunks only.
 The following nodes convert text chunks to audio chunks.
+- Node: **t2a-openai**<br/>
+  Purpose: **OpenAI Text-to-Speech conversion**<br/>
+  Example: `t2a-openai(voice: "nova", model: "tts-1-hd")`<br/>
+  Notice: this node requires an OpenAI API key!
+  > This node uses OpenAI TTS to perform Text-to-Speech (T2S)
+  > conversion, i.e., it converts the input text stream into an output
+  > audio stream. It supports six built-in voices and two models:
+  > `tts-1` for lower latency and `tts-1-hd` for higher quality.
+  > The language is automatically detected from the input text and
+  > supports many languages including German, English, French, Spanish,
+  > Chinese, Japanese, and more (no language parameter needed).
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | text        |
+  | output  | audio       |
+  | Parameter      | Position  | Default   | Requirement        |
+  | -------------- | --------- | --------- | ------------------ |
+  | **key**        | *none*    | env.SPEECHFLOW\_OPENAI\_KEY | *none* |
+  | **api**        | *none*    | "https://api.openai.com/v1" | `/^https?:\/\/.+/` |
+  | **voice**      | 0         | "alloy"   | `/^(?:alloy\|echo\|fable\|onyx\|nova\|shimmer)$/` |
+  | **model**      | 1         | "tts-1"   | `/^(?:tts-1\|tts-1-hd)$/` |
+  | **speed**      | 2         | 1.0       | `0.25 <= n <= 4.0` |
 - Node: **t2a-amazon**<br/>
   Purpose: **Amazon Polly Text-to-Speech conversion**<br/>
   Example: `t2a-amazon(language: "en", voice: "Danielle)`<br/>
@@ -996,7 +1232,7 @@ The following nodes convert text chunks to audio chunks.
   | **key**        | *none*    | env.SPEECHFLOW\_AMAZON\_KEY | *none* |
   | **secKey**     | *none*    | env.SPEECHFLOW\_AMAZON\_KEY\_SEC | *none* |
   | **region**     | *none*    | "eu-central-1" | *none* |
-  | **voice**      | 0         | "Amy"     | `^(?:Amy|Danielle|Joanna|Matthew|Ruth|Stephen|Viki|Daniel)$/` |
+  | **voice**      | 0         | "Amy"     | `/^(?:Amy\|Danielle\|Joanna\|Matthew\|Ruth\|Stephen\|Vicki\|Daniel)$/` |
   | **language**   | 1         | "en"      | `/^(?:de\|en)$/`  |
 - Node: **t2a-elevenlabs**<br/>
@@ -1018,11 +1254,34 @@ The following nodes convert text chunks to audio chunks.
   | **key**        | *none*    | env.SPEECHFLOW\_ELEVENLABS\_KEY | *none* |
   | **voice**      | 0         | "Brian"   | `/^(?:Brittney\|Cassidy\|Leonie\|Mark\|Brian)$/` |
   | **language**   | 1         | "de"      | `/^(?:de\|en)$/`  |
-  | **speed**      | 2         | 1.00      | `n >= 0`7 && n <= 1.2` |
+  | **speed**      | 2         | 1.00      | `n >= 0.7 && n <= 1.2` |
   | **stability**  | 3         | 0.5       | `n >= 0.0 && n <= 1.0` |
   | **similarity** | 4         | 0.75      | `n >= 0.0 && n <= 1.0` |
   | **optimize**   | 5         | "latency" | `/^(?:latency\|quality)$/` |
+- Node: **t2a-google**<br/>
+  Purpose: **Google Cloud Text-to-Speech conversion**<br/>
+  Example: `t2a-google(voice: "en-US-Neural2-J", language: "en-US")`<br/>
+  Notice: this node requires a Google Cloud API key!
+  > This node uses Google Cloud Text-to-Speech to perform Text-to-Speech (T2S)
+  > conversion, i.e., it converts the input text stream into an output
+  > audio stream. It supports various voices and languages with configurable
+  > speaking rate and pitch adjustment.
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | text        |
+  | output  | audio       |
+  | Parameter    | Position  | Default            | Requirement          |
+  | ------------ | --------- | ------------------ | -------------------- |
+  | **key**      | *none*    | env.SPEECHFLOW\_GOOGLE\_KEY | *none*      |
+  | **voice**    | 0         | "en-US-Neural2-J"  | *none*               |
+  | **language** | 1         | "en-US"            | *none*               |
+  | **speed**    | 2         | 1.0                | `0.25 <= n <= 4.0`   |
+  | **pitch**    | 3         | 0.0                | `-20.0 <= n <= 20.0` |
 - Node: **t2a-kokoro**<br/>
   Purpose: **Kokoro Text-to-Speech conversion**<br/>
   Example: `t2a-kokoro(language: "en")`<br/>
@@ -1043,6 +1302,26 @@ The following nodes convert text chunks to audio chunks.
   | **language** | 1         | "en"     | `/^en$/`    |
   | **speed**    | 2         | 1.25     | 1.0...1.30  |
+- Node: **t2a-supertonic**<br/>
+  Purpose: **Supertonic Text-to-Speech conversion**<br/>
+  Example: `t2a-supertonic(voice: "M1", speed: 1.40)`<br/>
+  > This node uses Supertonic to perform Text-to-Speech (T2S) conversion,
+  > i.e., it converts the input text stream into an output audio stream.
+  > It is intended to generate speech. The ONNX models are automatically
+  > downloaded from HuggingFace on first use. It supports English language only.
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | text        |
+  | output  | audio       |
+  | Parameter    | Position  | Default  | Requirement |
+  | ------------ | --------- | -------- | ----------- |
+  | **voice**    | 0         | "M1"     | `/^(?:M1\|M2\|F1\|F2)$/` |
+  | **speed**    | 1         | 1.40     | `0.5 <= n <= 2.0` |
+  | **steps**    | 2         | 20       | `1 <= n <= 20` |
 ### Any-to-Any Nodes
 The following nodes process any type of chunk, i.e., both audio and text chunks.
@@ -1064,8 +1343,8 @@ The following nodes process any type of chunk, i.e., both audio and text chunks.
   | Parameter    | Position  | Default  | Requirement           |
   | ------------ | --------- | -------- | --------------------- |
   | **type**     | 0         | "audio"  | `/^(?:audio\|text)$/` |
-  | **name**     | 1         | "filter" | `/^.+$/` |
-  | **var**      | 2         | ""       | `/^(?:meta:.+\|payload:(?:length\|text)\|time:(?:start\|end))$/` |
+  | **name**     | 1         | "filter" | `/^.+?$/` |
+  | **var**      | 2         | ""       | `/^(?:meta:.+\|payload:(?:length\|text)\|time:(?:start\|end)\|kind\|type)$/` |
   | **op**       | 3         | "=="     | `/^(?:<\|<=\|==\|!=\|~~\|!~\|>=\|>)$/` |
   | **val**      | 4         | ""       | `/^.*$/` |