npm - speechflow - Versions diffs - 0.9.0 - Mend

speechflow 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/README.md +239 -0
package/dst/speechflow-node-deepgram.js +135 -0
package/dst/speechflow-node-deepl.js +105 -0
package/dst/speechflow-node-device.js +95 -0
package/dst/speechflow-node-elevenlabs.js +131 -0
package/dst/speechflow-node-file.js +47 -0
package/dst/speechflow-node-websocket.js +147 -0
package/dst/speechflow-node.js +77 -0
package/dst/speechflow-util.js +37 -0
package/dst/speechflow.js +223 -0
package/etc/biome.jsonc +37 -0
package/etc/eslint.mjs +95 -0
package/etc/nps.yaml +40 -0
package/etc/oxlint.jsonc +20 -0
package/etc/tsconfig.json +23 -0
package/package.json +76 -0
package/sample.yaml +32 -0
package/src/lib.d.ts +20 -0
package/src/speechflow-logo.ai +1492 -4
package/src/speechflow-logo.svg +46 -0
package/src/speechflow-node-deepgram.ts +102 -0
package/src/speechflow-node-deepl.ts +76 -0
package/src/speechflow-node-device.ts +96 -0
package/src/speechflow-node-elevenlabs.ts +99 -0
package/src/speechflow-node-file.ts +46 -0
package/src/speechflow-node-websocket.ts +140 -0
package/src/speechflow-node.ts +76 -0
package/src/speechflow-util.ts +36 -0
package/src/speechflow.ts +242 -0
package/tsconfig.json +3 -0

package/README.md ADDED Viewed

@@ -0,0 +1,239 @@
+<img src="https://raw.githubusercontent.com/rse/speechflow/master/src/speechflow-logo.svg" width="400" align="right" alt=""/>
+SpeechFlow
+==========
+**Speech Processing Flow Graph**
+[![github (author stars)](https://img.shields.io/github/stars/rse?logo=github&label=author%20stars&color=%233377aa)](https://github.com/rse)
+[![github (author followers)](https://img.shields.io/github/followers/rse?label=author%20followers&logo=github&color=%234477aa)](https://github.com/rse)
+[![github (project stdver)](https://img.shields.io/github/package-json/x-stdver/rse/speechflow?logo=github&label=project%20stdver&color=%234477aa&cacheSeconds=900)](https://github.com/rse/speechflow)
+[![github (project release)](https://img.shields.io/github/package-json/x-release/rse/speechflow?logo=github&label=project%20release&color=%234477aa&cacheSeconds=900)](https://github.com/rse/speechflow)
+About
+-----
+**SpeechFlow** is a command-line interface based tool for establishing a
+directed data flow graph of audio and text processing nodes. This way it
+allows to perform various speech processing tasks in a flexible way.
+Installation
+------------
+```
+$ npm install -g speechflow
+```
+Usage
+-----
+```
+$ speechflow
+  [-h|--help]
+  [-V|--version]
+  [-v|--verbose <level>]
+  [-e|--expression <expression>]
+  [-f|--expression-file <expression-file>]
+  [-c|--config <key>@<yaml-config-file>]
+  [<argument> [...]]
+```
+Processing Graph Examples
+-------------------------
+- Capture audio from microphone to file:
+  ```
+  device(device: "wasapi:VoiceMeeter Out B1", mode: "r") |
+  file(path: "capture.pcm", mode: "w", type: "audio")
+  ```
+- Generate audio file with narration of text file:
+  ```
+  file(path: argv.0, mode: "r", type: "audio") |
+  deepgram(language: "en") |
+  file(path: argv.1, mode: "w", type: "text")
+  ```
+- Translate stdin to stdout:
+  ```
+  file(path: "-", mode: "r", type: "text") |
+  deepl(src: "de", dst: "en-US") |
+  file(path: "-", mode: "w", type: "text")
+  ```
+- Pass-through audio from microphone to speaker and in parallel record it to file:
+  ```
+  device(device: "wasapi:VoiceMeeter Out B1", mode: "r") | {
+      file(path: "capture.pcm", mode: "w", type: "audio"),
+      device(device: "wasapi:VoiceMeeter VAIO3 Input", mode: "w")
+  }
+  ```
+- Real-time translation from german to english, including capturing of all inputs and outputs:
+  ```
+  device(device: "wasapi:VoiceMeeter Out B1", mode: "r") | {
+      file(path: "translation-audio-de.pcm", mode: "w", type: "audio"),
+      deepgram(language: "de") |
+      file(path: "translation-text-de.txt", mode: "w", type: "text")
+  } | {
+      deepl(src: "de", dst: "en-US") |
+      file(path: "translation-text-en.txt", mode: "w", type: "text")
+  } | {
+      elevenlabs(language: "en") | {
+          file(path: "translation-audio-en.pcm", mode: "w", type: "audio"),
+          device(device: "wasapi:VoiceMeeter VAIO3 Input", mode: "w")
+      }
+  }
+  ```
+Processing Node Types
+---------------------
+Currently **SpeechFlow** provides the following processing nodes:
+- Node:    **file**<br/>
+  Purpose: **File and StdIO source/sink**<br/>
+  Example: `file(path: "capture.pcm", mode: "w", type: "audio")`
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | text, audio |
+  | output  | text, audio |
+  | Parameter  | Position  | Default  | Requirement           |
+  | ---------- | --------- | -------- | --------------------- |
+  | **path**   | 0         | *none*   | *none*                |
+  | **mode**   | 1         | "r"      | `/^(?:r\|w\|rw)$/`    |
+  | **type**   | 2         | "audio"  | `/^(?:audio\|text)$/` |
+- Node: **websocket**<br/>
+  Purpose: **WebSocket source/sink**<br/>
+  Example: `websocket(connect: "ws://127.0.0.1:12345". type: "text")`
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | text, audio |
+  | output  | text, audio |
+  | Parameter   | Position  | Default  | Requirement           |
+  | ----------- | --------- | -------- | --------------------- |
+  | **listen**  | *none*    | *none*   | `/^(?:\|ws:\/\/(.+?):(\d+))$/` |
+  | **connect** | *none*    | *none*   | `/^(?:\|ws:\/\/(.+?):(\d+)(?:\/.*)?)$/` |
+  | **type**    | *none*    | "audio"  | `/^(?:audio\|text)$/` |
+- Node: **device**<br/>
+  Purpose: **Microphone/speaker device source/sink**<br/>
+  Example: `device(device: "wasapi:VoiceMeeter Out B1", mode: "r")`
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | audio       |
+  | output  | audio       |
+  | Parameter   | Position  | Default  | Requirement        |
+  | ----------- | --------- | -------- | ------------------ |
+  | **device**  | 0         | *none*   | `/^(.+?):(.+)$/`   |
+  | **mode**    | 1         | "rw"     | `/^(?:r\|w\|rw)$/` |
+- Node: **deepgram**<br/>
+  Purpose: **Deepgram Speech-to-Text conversion**<br/>
+  Example: `deepgram(language: "de")`<br/>
+  Notice: this node requires an API key!
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | audio       |
+  | output  | text        |
+  | Parameter    | Position  | Default  | Requirement        |
+  | ------------ | --------- | -------- | ------------------ |
+  | **key**      | *none*    | env.SPEECHFLOW\_KEY\_DEEPGRAM | *none* |
+  | **model**    | 0         | "nova-2" | *none* |
+  | **version**  | 1         | "latest" | *none* |
+  | **language** | 2         | "de"     | *none* |
+- Node: **deepl**<br/>
+  Purpose: **DeepL Text-to-Text translation**<br/>
+  Example: `deepl(src: "de", dst: "en-US")`<br/>
+  Notice: this node requires an API key!
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | text        |
+  | output  | text        |
+  | Parameter    | Position  | Default  | Requirement        |
+  | ------------ | --------- | -------- | ------------------ |
+  | **key**      | *none*    | env.SPEECHFLOW\_KEY\_DEEPL | *none* |
+  | **src**      | 0         | "de"     | `/^(?:de\|en-US)$/` |
+  | **dst**      | 1         | "en-US"  | `/^(?:de\|en-US)$/` |
+- Node: **elevenlabs**<br/>
+  Purpose: **ElevenLabs Text-to-Speech conversion**<br/>
+  Example: `elevenlabs(language: "en")`<br/>
+  Notice: this node requires an API key!
+  | Port    | Payload     |
+  | ------- | ----------- |
+  | input   | text        |
+  | output  | audio       |
+  | Parameter    | Position  | Default  | Requirement        |
+  | ------------ | --------- | -------- | ------------------ |
+  | **key**      | *none*    | env.SPEECHFLOW\_KEY\_ELEVENLABS | *none* |
+  | **voice**    | 0         | "Brian"  | *none* |
+  | **language** | 1         | "de"     | *none* |
+Graph Expression Language
+-------------------------
+The **SpeechFlow** graph expression language is based on
+[**FlowLink**](https://npmjs.org/flowlink), which itself has a language
+following the following BNF-style grammar:
+```
+expr             ::= parallel
+                   | sequential
+                   | node
+                   | group
+parallel         ::= sequential ("," sequential)+
+sequential       ::= node ("|" node)+
+node             ::= id ("(" (param ("," param)*)? ")")?
+param            ::= array | object | variable | template | string | number | value
+group            ::= "{" expr "}"
+id               ::= /[a-zA-Z_][a-zA-Z0-9_-]*/
+variable         ::= id
+array            ::= "[" (param ("," param)*)? "]"
+object           ::= "{" (id ":" param ("," id ":" param)*)? "}"
+template         ::= "`" ("${" variable "}" / ("\\`"|.))* "`"
+string           ::= /"(\\"|.)*"/
+                   | /'(\\'|.)*'/
+number           ::= /[+-]?/ number-value
+number-value     ::= "0b" /[01]+/
+                   | "0o" /[0-7]+/
+                   | "0x" /[0-9a-fA-F]+/
+                   | /[0-9]*\.[0-9]+([eE][+-]?[0-9]+)?/
+                   | /[0-9]+/
+value            ::= "true" | "false" | "null" | "NaN" | "undefined"
+```
+History
+-------
+**Speechflow**, as a technical cut-through, was initially created in
+March 2024 for use in the msg Filmstudio context. It was later refined
+into a more complete toolkit in April 2025 and this way the first time
+could be used in production.
+Copyright & License
+-------------------
+Copyright &copy; 2024-2025 [Dr. Ralf S. Engelschall](mailto:rse@engelschall.com)<br/>
+Licensed under [GPL 3.0](https://spdx.org/licenses/GPL-3.0-only)

package/dst/speechflow-node-deepgram.js ADDED Viewed

@@ -0,0 +1,135 @@
+"use strict";
+/*
+**  SpeechFlow - Speech Processing Flow Graph
+**  Copyright (c) 2024-2025 Dr. Ralf S. Engelschall <rse@engelschall.com>
+**  Licensed under GPL 3.0 <https://spdx.org/licenses/GPL-3.0-only>
+*/
+var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    var desc = Object.getOwnPropertyDescriptor(m, k);
+    if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+      desc = { enumerable: true, get: function() { return m[k]; } };
+    }
+    Object.defineProperty(o, k2, desc);
+}) : (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    o[k2] = m[k];
+}));
+var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
+    Object.defineProperty(o, "default", { enumerable: true, value: v });
+}) : function(o, v) {
+    o["default"] = v;
+});
+var __importStar = (this && this.__importStar) || (function () {
+    var ownKeys = function(o) {
+        ownKeys = Object.getOwnPropertyNames || function (o) {
+            var ar = [];
+            for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
+            return ar;
+        };
+        return ownKeys(o);
+    };
+    return function (mod) {
+        if (mod && mod.__esModule) return mod;
+        var result = {};
+        if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
+        __setModuleDefault(result, mod);
+        return result;
+    };
+})();
+var __importDefault = (this && this.__importDefault) || function (mod) {
+    return (mod && mod.__esModule) ? mod : { "default": mod };
+};
+Object.defineProperty(exports, "__esModule", { value: true });
+const node_events_1 = require("node:events");
+const node_stream_1 = __importDefault(require("node:stream"));
+const Deepgram = __importStar(require("@deepgram/sdk"));
+const speechflow_node_1 = __importDefault(require("./speechflow-node"));
+class SpeechFlowNodeDevice extends speechflow_node_1.default {
+    dg = null;
+    constructor(id, opts, args) {
+        super(id, opts, args);
+        this.configure({
+            key: { type: "string", val: process.env.SPEECHFLOW_KEY_DEEPGRAM },
+            model: { type: "string", val: "nova-2", pos: 0 }, /* FIXME: nova-3 multiligual */
+            version: { type: "string", val: "latest", pos: 1 },
+            language: { type: "string", val: "de", pos: 2 }
+        });
+    }
+    async open() {
+        this.input = "audio";
+        this.output = "text";
+        this.stream = null;
+        /*  sanity check situation  */
+        if (this.config.audioBitDepth !== 16 || !this.config.audioLittleEndian)
+            throw new Error("Deepgram node currently supports PCM-S16LE audio only");
+        /*  connect to Deepgram API  */
+        const queue = new node_events_1.EventEmitter();
+        const deepgram = Deepgram.createClient(this.params.key);
+        this.dg = deepgram.listen.live({
+            model: this.params.model,
+            version: this.params.version,
+            language: this.params.language,
+            channels: this.config.audioChannels,
+            sample_rate: this.config.audioSampleRate,
+            encoding: "linear16",
+            multichannel: false,
+            // endpointing:      false,  /* FIXME: ? */
+            interim_results: false,
+            smart_format: true,
+            punctuate: true,
+            filler_words: true,
+            diarize: true,
+            numerals: true,
+            paragraphs: true,
+            profanity_filter: true,
+            utterances: false,
+        });
+        await new Promise((resolve) => {
+            this.dg.on(Deepgram.LiveTranscriptionEvents.Open, () => {
+                this.log("info", "Deepgram: connection open");
+                resolve(true);
+            });
+        });
+        /*  hooks onto Deepgram API events  */
+        this.dg.on(Deepgram.LiveTranscriptionEvents.Close, () => {
+            this.log("info", "Deepgram: connection close");
+        });
+        this.dg.on(Deepgram.LiveTranscriptionEvents.Transcript, async (data) => {
+            const text = data.channel?.alternatives[0].transcript ?? "";
+            if (text === "")
+                return;
+            queue.emit("text", text);
+        });
+        this.dg.on(Deepgram.LiveTranscriptionEvents.Error, (error) => {
+            this.log("error", `Deepgram: ${error}`);
+        });
+        /*  provide Duplex stream and internally attach to Deepgram API  */
+        const dg = this.dg;
+        this.stream = new node_stream_1.default.Duplex({
+            write(chunk, encoding, callback) {
+                const data = chunk.buffer.slice(chunk.byteOffset, chunk.byteOffset + chunk.byteLength);
+                if (data.byteLength === 0)
+                    queue.emit("text", "");
+                else
+                    dg.send(data);
+                callback();
+            },
+            read(size) {
+                queue.once("text", (text) => {
+                    if (text !== "")
+                        this.push(text);
+                });
+            }
+        });
+    }
+    async close() {
+        if (this.stream !== null) {
+            this.stream.destroy();
+            this.stream = null;
+        }
+        if (this.dg !== null)
+            this.dg.requestClose();
+    }
+}
+exports.default = SpeechFlowNodeDevice;

package/dst/speechflow-node-deepl.js ADDED Viewed

@@ -0,0 +1,105 @@
+"use strict";
+/*
+**  SpeechFlow - Speech Processing Flow Graph
+**  Copyright (c) 2024-2025 Dr. Ralf S. Engelschall <rse@engelschall.com>
+**  Licensed under GPL 3.0 <https://spdx.org/licenses/GPL-3.0-only>
+*/
+var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    var desc = Object.getOwnPropertyDescriptor(m, k);
+    if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+      desc = { enumerable: true, get: function() { return m[k]; } };
+    }
+    Object.defineProperty(o, k2, desc);
+}) : (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    o[k2] = m[k];
+}));
+var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
+    Object.defineProperty(o, "default", { enumerable: true, value: v });
+}) : function(o, v) {
+    o["default"] = v;
+});
+var __importStar = (this && this.__importStar) || (function () {
+    var ownKeys = function(o) {
+        ownKeys = Object.getOwnPropertyNames || function (o) {
+            var ar = [];
+            for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
+            return ar;
+        };
+        return ownKeys(o);
+    };
+    return function (mod) {
+        if (mod && mod.__esModule) return mod;
+        var result = {};
+        if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
+        __setModuleDefault(result, mod);
+        return result;
+    };
+})();
+var __importDefault = (this && this.__importDefault) || function (mod) {
+    return (mod && mod.__esModule) ? mod : { "default": mod };
+};
+Object.defineProperty(exports, "__esModule", { value: true });
+const node_stream_1 = __importDefault(require("node:stream"));
+const node_events_1 = require("node:events");
+const speechflow_node_1 = __importDefault(require("./speechflow-node"));
+const DeepL = __importStar(require("deepl-node"));
+class SpeechFlowNodeDeepL extends speechflow_node_1.default {
+    translator = null;
+    constructor(id, opts, args) {
+        super(id, opts, args);
+        this.input = "text";
+        this.output = "text";
+        this.stream = null;
+        this.configure({
+            key: { type: "string", val: process.env.SPEECHFLOW_KEY_DEEPL },
+            src: { type: "string", pos: 0, val: "de", match: /^(?:de|en-US)$/ },
+            dst: { type: "string", pos: 1, val: "en-US", match: /^(?:de|en-US)$/ }
+        });
+    }
+    async open() {
+        /*  instantiate DeepL API SDK  */
+        this.translator = new DeepL.Translator(this.params.key);
+        /*  provide text-to-text translation  */
+        const translate = async (text) => {
+            const result = await this.translator.translateText(text, this.params.src, this.params.dst, {
+                splitSentences: "off"
+            });
+            return (result?.text ?? text);
+        };
+        /*  establish a duplex stream and connect it to the translation  */
+        const queue = new node_events_1.EventEmitter();
+        this.stream = new node_stream_1.default.Duplex({
+            write(chunk, encoding, callback) {
+                const data = chunk.toString();
+                if (data === "") {
+                    queue.emit("result", "");
+                    callback();
+                }
+                else {
+                    translate(data).then((result) => {
+                        queue.emit("result", result);
+                        callback();
+                    }).catch((err) => {
+                        callback(err);
+                    });
+                }
+            },
+            read(size) {
+                queue.once("result", (result) => {
+                    this.push(result);
+                });
+            }
+        });
+    }
+    async close() {
+        if (this.stream !== null) {
+            this.stream.destroy();
+            this.stream = null;
+        }
+        if (this.translator !== null)
+            this.translator = null;
+    }
+}
+exports.default = SpeechFlowNodeDeepL;

package/dst/speechflow-node-device.js ADDED Viewed

@@ -0,0 +1,95 @@
+"use strict";
+/*
+**  SpeechFlow - Speech Processing Flow Graph
+**  Copyright (c) 2024-2025 Dr. Ralf S. Engelschall <rse@engelschall.com>
+**  Licensed under GPL 3.0 <https://spdx.org/licenses/GPL-3.0-only>
+*/
+var __importDefault = (this && this.__importDefault) || function (mod) {
+    return (mod && mod.__esModule) ? mod : { "default": mod };
+};
+Object.defineProperty(exports, "__esModule", { value: true });
+const naudiodon_1 = __importDefault(require("@gpeng/naudiodon"));
+const speechflow_node_1 = __importDefault(require("./speechflow-node"));
+const speechflow_util_1 = __importDefault(require("./speechflow-util"));
+class SpeechFlowNodeDevice extends speechflow_node_1.default {
+    io = null;
+    constructor(id, opts, args) {
+        super(id, opts, args);
+        this.configure({
+            device: { type: "string", pos: 0, match: /^(.+?):(.+)$/ },
+            mode: { type: "string", pos: 1, val: "rw", match: /^(?:r|w|rw)$/ }
+        });
+    }
+    async open() {
+        /*  determine device  */
+        const device = speechflow_util_1.default.audioDeviceFromURL(this.params.mode, this.params.device);
+        /*  sanity check sample rate compatibility
+            (we still do not resample in input/output for simplification reasons)  */
+        if (device.defaultSampleRate !== this.config.audioSampleRate)
+            throw new Error(`device audio sample rate ${device.defaultSampleRate} is ` +
+                `incompatible with required sample rate ${this.config.audioSampleRate}`);
+        /*  establish device connection
+            Notice: "naudion" actually implements Stream.{Readable,Writable,Duplex}, but
+            declares just its sub-interface NodeJS.{Readable,Writable,Duplex}Stream,
+            so it is correct to cast it back to Stream.{Readable,Writable,Duplex}  */
+        if (device.maxInputChannels > 0 && device.maxOutputChannels > 0) {
+            this.log("info", `resolved "${this.params.device}" to duplex device "${device.id}"`);
+            this.input = "audio";
+            this.output = "audio";
+            this.io = naudiodon_1.default.AudioIO({
+                inOptions: {
+                    deviceId: device.id,
+                    channelCount: this.config.audioChannels,
+                    sampleRate: this.config.audioSampleRate,
+                    sampleFormat: this.config.audioBitDepth
+                },
+                outOptions: {
+                    deviceId: device.id,
+                    channelCount: this.config.audioChannels,
+                    sampleRate: this.config.audioSampleRate,
+                    sampleFormat: this.config.audioBitDepth
+                }
+            });
+            this.stream = this.io;
+        }
+        else if (device.maxInputChannels > 0 && device.maxOutputChannels === 0) {
+            this.log("info", `resolved "${this.params.device}" to input device "${device.id}"`);
+            this.input = "none";
+            this.output = "audio";
+            this.io = naudiodon_1.default.AudioIO({
+                inOptions: {
+                    deviceId: device.id,
+                    channelCount: this.config.audioChannels,
+                    sampleRate: this.config.audioSampleRate,
+                    sampleFormat: this.config.audioBitDepth
+                }
+            });
+            this.stream = this.io;
+        }
+        else if (device.maxInputChannels === 0 && device.maxOutputChannels > 0) {
+            this.log("info", `resolved "${this.params.device}" to output device "${device.id}"`);
+            this.input = "audio";
+            this.output = "none";
+            this.io = naudiodon_1.default.AudioIO({
+                outOptions: {
+                    deviceId: device.id,
+                    channelCount: this.config.audioChannels,
+                    sampleRate: this.config.audioSampleRate,
+                    sampleFormat: this.config.audioBitDepth
+                }
+            });
+            this.stream = this.io;
+        }
+        else
+            throw new Error(`device "${device.id}" does not have any input or output channels`);
+        /*  pass-through errors  */
+        this.io.on("error", (err) => {
+            this.emit("error", err);
+        });
+    }
+    async close() {
+        if (this.io !== null)
+            this.io.quit();
+    }
+}
+exports.default = SpeechFlowNodeDevice;