speechflow 1.3.0 → 1.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +15 -0
- package/README.md +165 -22
- package/dst/speechflow-node-a2a-gender.d.ts +2 -0
- package/dst/speechflow-node-a2a-gender.js +137 -59
- package/dst/speechflow-node-a2a-gender.js.map +1 -1
- package/dst/speechflow-node-a2a-meter.d.ts +3 -1
- package/dst/speechflow-node-a2a-meter.js +79 -35
- package/dst/speechflow-node-a2a-meter.js.map +1 -1
- package/dst/speechflow-node-a2a-mute.d.ts +1 -0
- package/dst/speechflow-node-a2a-mute.js +37 -11
- package/dst/speechflow-node-a2a-mute.js.map +1 -1
- package/dst/speechflow-node-a2a-vad.d.ts +3 -0
- package/dst/speechflow-node-a2a-vad.js +194 -96
- package/dst/speechflow-node-a2a-vad.js.map +1 -1
- package/dst/speechflow-node-a2a-wav.js +27 -11
- package/dst/speechflow-node-a2a-wav.js.map +1 -1
- package/dst/speechflow-node-a2t-deepgram.d.ts +4 -0
- package/dst/speechflow-node-a2t-deepgram.js +141 -43
- package/dst/speechflow-node-a2t-deepgram.js.map +1 -1
- package/dst/speechflow-node-t2a-elevenlabs.d.ts +2 -0
- package/dst/speechflow-node-t2a-elevenlabs.js +61 -12
- package/dst/speechflow-node-t2a-elevenlabs.js.map +1 -1
- package/dst/speechflow-node-t2a-kokoro.d.ts +1 -0
- package/dst/speechflow-node-t2a-kokoro.js +10 -4
- package/dst/speechflow-node-t2a-kokoro.js.map +1 -1
- package/dst/speechflow-node-t2t-deepl.js +8 -4
- package/dst/speechflow-node-t2t-deepl.js.map +1 -1
- package/dst/speechflow-node-t2t-format.js +2 -2
- package/dst/speechflow-node-t2t-format.js.map +1 -1
- package/dst/speechflow-node-t2t-ollama.js +1 -1
- package/dst/speechflow-node-t2t-ollama.js.map +1 -1
- package/dst/speechflow-node-t2t-openai.js +1 -1
- package/dst/speechflow-node-t2t-openai.js.map +1 -1
- package/dst/speechflow-node-t2t-sentence.d.ts +1 -1
- package/dst/speechflow-node-t2t-sentence.js +35 -24
- package/dst/speechflow-node-t2t-sentence.js.map +1 -1
- package/dst/speechflow-node-t2t-subtitle.js +85 -17
- package/dst/speechflow-node-t2t-subtitle.js.map +1 -1
- package/dst/speechflow-node-t2t-transformers.js +2 -2
- package/dst/speechflow-node-t2t-transformers.js.map +1 -1
- package/dst/speechflow-node-x2x-filter.js +4 -4
- package/dst/speechflow-node-x2x-trace.js +1 -1
- package/dst/speechflow-node-x2x-trace.js.map +1 -1
- package/dst/speechflow-node-xio-device.js +12 -8
- package/dst/speechflow-node-xio-device.js.map +1 -1
- package/dst/speechflow-node-xio-file.js +9 -3
- package/dst/speechflow-node-xio-file.js.map +1 -1
- package/dst/speechflow-node-xio-mqtt.js +5 -2
- package/dst/speechflow-node-xio-mqtt.js.map +1 -1
- package/dst/speechflow-node-xio-websocket.js +11 -11
- package/dst/speechflow-node-xio-websocket.js.map +1 -1
- package/dst/speechflow-utils.d.ts +5 -0
- package/dst/speechflow-utils.js +77 -44
- package/dst/speechflow-utils.js.map +1 -1
- package/dst/speechflow.js +104 -34
- package/dst/speechflow.js.map +1 -1
- package/etc/eslint.mjs +1 -2
- package/etc/speechflow.yaml +18 -7
- package/etc/stx.conf +3 -3
- package/package.json +14 -13
- package/src/speechflow-node-a2a-gender.ts +148 -64
- package/src/speechflow-node-a2a-meter.ts +87 -40
- package/src/speechflow-node-a2a-mute.ts +39 -11
- package/src/speechflow-node-a2a-vad.ts +206 -100
- package/src/speechflow-node-a2a-wav.ts +27 -11
- package/src/speechflow-node-a2t-deepgram.ts +148 -45
- package/src/speechflow-node-t2a-elevenlabs.ts +65 -12
- package/src/speechflow-node-t2a-kokoro.ts +11 -4
- package/src/speechflow-node-t2t-deepl.ts +9 -4
- package/src/speechflow-node-t2t-format.ts +2 -2
- package/src/speechflow-node-t2t-ollama.ts +1 -1
- package/src/speechflow-node-t2t-openai.ts +1 -1
- package/src/speechflow-node-t2t-sentence.ts +38 -27
- package/src/speechflow-node-t2t-subtitle.ts +62 -15
- package/src/speechflow-node-t2t-transformers.ts +4 -3
- package/src/speechflow-node-x2x-filter.ts +4 -4
- package/src/speechflow-node-x2x-trace.ts +1 -1
- package/src/speechflow-node-xio-device.ts +12 -8
- package/src/speechflow-node-xio-file.ts +9 -3
- package/src/speechflow-node-xio-mqtt.ts +5 -2
- package/src/speechflow-node-xio-websocket.ts +12 -12
- package/src/speechflow-utils.ts +78 -44
- package/src/speechflow.ts +117 -36
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,21 @@
|
|
|
2
2
|
ChangeLog
|
|
3
3
|
=========
|
|
4
4
|
|
|
5
|
+
1.3.2 (2025-08-04)
|
|
6
|
+
------------------
|
|
7
|
+
|
|
8
|
+
- BUGFIX: many timeout handling fixes in many nodes
|
|
9
|
+
- CLEANUP: many code cleanups
|
|
10
|
+
|
|
11
|
+
1.3.1 (2025-07-31)
|
|
12
|
+
------------------
|
|
13
|
+
|
|
14
|
+
- BUGFIX: wait a longer time for "deepgram" node to open
|
|
15
|
+
- IMPROVEMENT: keep word information as meta information in "deepgram" node
|
|
16
|
+
- IMPROVEMENT: support words in subtitle generation in "subtitle" node
|
|
17
|
+
- BUGFIX: fix WebVTT format generation in "subtitle" node
|
|
18
|
+
- UPGRADE: upgrade NPM dependencies
|
|
19
|
+
|
|
5
20
|
1.3.0 (2025-07-26)
|
|
6
21
|
------------------
|
|
7
22
|
|
package/README.md
CHANGED
|
@@ -56,14 +56,14 @@ ships as an installable package for the Node Package Manager (NPM).
|
|
|
56
56
|
Installation
|
|
57
57
|
------------
|
|
58
58
|
|
|
59
|
-
```
|
|
59
|
+
```sh
|
|
60
60
|
$ npm install -g speechflow
|
|
61
61
|
```
|
|
62
62
|
|
|
63
63
|
Usage
|
|
64
64
|
-----
|
|
65
65
|
|
|
66
|
-
```
|
|
66
|
+
```sh
|
|
67
67
|
$ speechflow
|
|
68
68
|
[-h|--help]
|
|
69
69
|
[-V|--version]
|
|
@@ -251,12 +251,19 @@ First a short overview of the available processing nodes:
|
|
|
251
251
|
**filter**,
|
|
252
252
|
**trace**.
|
|
253
253
|
|
|
254
|
-
### Input/Output Nodes
|
|
254
|
+
### Input/Output Nodes
|
|
255
|
+
|
|
256
|
+
The following nodes are for external I/O, i.e, to read/write from
|
|
257
|
+
external files, devices and network services.
|
|
255
258
|
|
|
256
259
|
- Node: **file**<br/>
|
|
257
260
|
Purpose: **File and StdIO source/sink**<br/>
|
|
258
261
|
Example: `file(path: "capture.pcm", mode: "w", type: "audio")`
|
|
259
262
|
|
|
263
|
+
> This node allows the reading/writing from/to files or from StdIO. It
|
|
264
|
+
> is intended to be used as source and sink nodes in batch processing,
|
|
265
|
+
> and as sing nodes in real-time processing.
|
|
266
|
+
|
|
260
267
|
| Port | Payload |
|
|
261
268
|
| ------- | ----------- |
|
|
262
269
|
| input | text, audio |
|
|
@@ -274,6 +281,10 @@ First a short overview of the available processing nodes:
|
|
|
274
281
|
Purpose: **Microphone/speaker device source/sink**<br/>
|
|
275
282
|
Example: `device(device: "wasapi:VoiceMeeter Out B1", mode: "r")`
|
|
276
283
|
|
|
284
|
+
> This node allows the reading/writing from/to audio devices. It is
|
|
285
|
+
> intended to be used as source nodes for microphone devices and as
|
|
286
|
+
> sink nodes for speaker devices.
|
|
287
|
+
|
|
277
288
|
| Port | Payload |
|
|
278
289
|
| ------- | ----------- |
|
|
279
290
|
| input | audio |
|
|
@@ -290,6 +301,11 @@ First a short overview of the available processing nodes:
|
|
|
290
301
|
Example: `websocket(connect: "ws://127.0.0.1:12345", type: "text")`
|
|
291
302
|
Notice: this node requires a peer WebSocket service!
|
|
292
303
|
|
|
304
|
+
> This node allows reading/writing from/to WebSocket network services.
|
|
305
|
+
> It is primarily intended to be used for sending out the text of
|
|
306
|
+
> subtitles, but can be also used for receiving the text to be
|
|
307
|
+
> processed.
|
|
308
|
+
|
|
293
309
|
| Port | Payload |
|
|
294
310
|
| ------- | ----------- |
|
|
295
311
|
| input | text, audio |
|
|
@@ -306,6 +322,10 @@ First a short overview of the available processing nodes:
|
|
|
306
322
|
Example: `mqtt(url: "mqtt://127.0.0.1:1883", username: "foo", password: "bar", topic: "quux")`
|
|
307
323
|
Notice: this node requires a peer MQTT broker!
|
|
308
324
|
|
|
325
|
+
> This node allows reading/writing from/to MQTT broker topics. It is
|
|
326
|
+
> primarily intended to be used for sending out the text of subtitles,
|
|
327
|
+
> but can be also used for receiving the text to be processed.
|
|
328
|
+
|
|
309
329
|
| Port | Payload |
|
|
310
330
|
| ------- | ----------- |
|
|
311
331
|
| input | text |
|
|
@@ -313,17 +333,23 @@ First a short overview of the available processing nodes:
|
|
|
313
333
|
|
|
314
334
|
| Parameter | Position | Default | Requirement |
|
|
315
335
|
| ------------ | --------- | -------- | --------------------- |
|
|
316
|
-
| **url** | 0 | *none* | `/^(?:\|(?:ws
|
|
336
|
+
| **url** | 0 | *none* | `/^(?:\|(?:ws\|mqtt):\/\/(.+?):(\d+))$/` |
|
|
317
337
|
| **username** | 1 | *none* | `/^.+$/` |
|
|
318
338
|
| **password** | 2 | *none* | `/^.+$/` |
|
|
319
339
|
| **topic** | 3 | *none* | `/^.+$/` |
|
|
320
340
|
|
|
321
|
-
### Audio-to-Audio Nodes
|
|
341
|
+
### Audio-to-Audio Nodes
|
|
342
|
+
|
|
343
|
+
The following nodes process audio chunks only.
|
|
322
344
|
|
|
323
345
|
- Node: **ffmpeg**<br/>
|
|
324
346
|
Purpose: **FFmpeg audio format conversion**<br/>
|
|
325
347
|
Example: `ffmpeg(src: "pcm", dst: "mp3")`
|
|
326
348
|
|
|
349
|
+
> This node allows converting between audio formats. It is primarily
|
|
350
|
+
> intended to support the reading/writing of external MP3 and Opus
|
|
351
|
+
> format files, although SpeechFlow internally uses PCM format only.
|
|
352
|
+
|
|
327
353
|
| Port | Payload |
|
|
328
354
|
| ------- | ----------- |
|
|
329
355
|
| input | audio |
|
|
@@ -338,6 +364,10 @@ First a short overview of the available processing nodes:
|
|
|
338
364
|
Purpose: **WAV audio format conversion**<br/>
|
|
339
365
|
Example: `wav(mode: "encode")`
|
|
340
366
|
|
|
367
|
+
> This node allows converting between PCM and WAV audio formats. It is
|
|
368
|
+
> primarily intended to support the reading/writing of external WAV
|
|
369
|
+
> format files, although SpeechFlow internally uses PCM format only.
|
|
370
|
+
|
|
341
371
|
| Port | Payload |
|
|
342
372
|
| ------- | ----------- |
|
|
343
373
|
| input | audio |
|
|
@@ -352,6 +382,9 @@ First a short overview of the available processing nodes:
|
|
|
352
382
|
Example: `mute()`
|
|
353
383
|
Notice: this node has to be externally controlled via REST/WebSockets!
|
|
354
384
|
|
|
385
|
+
> This node allows muting the audio stream by either silencing or even
|
|
386
|
+
> unplugging. It has to be externally controlled via REST/WebSocket (see below).
|
|
387
|
+
|
|
355
388
|
| Port | Payload |
|
|
356
389
|
| ------- | ----------- |
|
|
357
390
|
| input | audio |
|
|
@@ -364,6 +397,10 @@ First a short overview of the available processing nodes:
|
|
|
364
397
|
Purpose: **Loudness metering node**<br/>
|
|
365
398
|
Example: `meter(250)`
|
|
366
399
|
|
|
400
|
+
> This node allows measuring the loudness of the audio stream. The
|
|
401
|
+
> results are emitted to both the logfile of **SpeechFlow** and the
|
|
402
|
+
> WebSockets API (see below).
|
|
403
|
+
|
|
367
404
|
| Port | Payload |
|
|
368
405
|
| ------- | ----------- |
|
|
369
406
|
| input | audio |
|
|
@@ -377,6 +414,10 @@ First a short overview of the available processing nodes:
|
|
|
377
414
|
Purpose: **Voice Audio Detection (VAD) node**<br/>
|
|
378
415
|
Example: `vad()`
|
|
379
416
|
|
|
417
|
+
> This node perform Voice Audio Detection (VAD), i.e., it detects
|
|
418
|
+
> voice in the audio stream and if not detected either silences or
|
|
419
|
+
> unplugs the audio stream.
|
|
420
|
+
|
|
380
421
|
| Port | Payload |
|
|
381
422
|
| ------- | ----------- |
|
|
382
423
|
| input | audio |
|
|
@@ -384,7 +425,7 @@ First a short overview of the available processing nodes:
|
|
|
384
425
|
|
|
385
426
|
| Parameter | Position | Default | Requirement |
|
|
386
427
|
| ----------- | --------- | -------- | ------------------------ |
|
|
387
|
-
| **mode** | *none* | "unplugged" | `/^(?:silenced
|
|
428
|
+
| **mode** | *none* | "unplugged" | `/^(?:silenced\|unplugged)$/` |
|
|
388
429
|
| **posSpeechThreshold** | *none* | 0.50 | *none* |
|
|
389
430
|
| **negSpeechThreshold** | *none* | 0.35 | *none* |
|
|
390
431
|
| **minSpeechFrames** | *none* | 2 | *none* |
|
|
@@ -396,6 +437,10 @@ First a short overview of the available processing nodes:
|
|
|
396
437
|
Purpose: **Gender Detection node**<br/>
|
|
397
438
|
Example: `gender()`
|
|
398
439
|
|
|
440
|
+
> This node performs gender detection on the audio stream. It
|
|
441
|
+
> annotates the audio chunks with `gender=male` or `gender=female`
|
|
442
|
+
> meta information. Use this meta information with the "filter" node.
|
|
443
|
+
|
|
399
444
|
| Port | Payload |
|
|
400
445
|
| ------- | ----------- |
|
|
401
446
|
| input | audio |
|
|
@@ -405,13 +450,19 @@ First a short overview of the available processing nodes:
|
|
|
405
450
|
| ----------- | --------- | -------- | ------------------------ |
|
|
406
451
|
| **window** | 0 | 500 | *none* |
|
|
407
452
|
|
|
408
|
-
### Audio-to-Text Nodes
|
|
453
|
+
### Audio-to-Text Nodes
|
|
454
|
+
|
|
455
|
+
The following nodes convert audio to text chunks.
|
|
409
456
|
|
|
410
457
|
- Node: **deepgram**<br/>
|
|
411
458
|
Purpose: **Deepgram Speech-to-Text conversion**<br/>
|
|
412
459
|
Example: `deepgram(language: "de")`<br/>
|
|
413
460
|
Notice: this node requires an API key!
|
|
414
461
|
|
|
462
|
+
> This node performs Speech-to-Text (S2T) conversion, i.e., it
|
|
463
|
+
> recognizes speech in the input audio stream and outputs a
|
|
464
|
+
> corresponding text stream.
|
|
465
|
+
|
|
415
466
|
| Port | Payload |
|
|
416
467
|
| ------- | ----------- |
|
|
417
468
|
| input | audio |
|
|
@@ -425,13 +476,17 @@ First a short overview of the available processing nodes:
|
|
|
425
476
|
| **version** | 1 | "latest" | *none* |
|
|
426
477
|
| **language** | 2 | "multi" | *none* |
|
|
427
478
|
|
|
428
|
-
### Text-to-Text Nodes
|
|
479
|
+
### Text-to-Text Nodes
|
|
480
|
+
|
|
481
|
+
The following nodes process text chunks only.
|
|
429
482
|
|
|
430
483
|
- Node: **deepl**<br/>
|
|
431
484
|
Purpose: **DeepL Text-to-Text translation**<br/>
|
|
432
485
|
Example: `deepl(src: "de", dst: "en")`<br/>
|
|
433
486
|
Notice: this node requires an API key!
|
|
434
487
|
|
|
488
|
+
> This node performs translation between English and German languages.
|
|
489
|
+
|
|
435
490
|
| Port | Payload |
|
|
436
491
|
| ------- | ----------- |
|
|
437
492
|
| input | text |
|
|
@@ -448,6 +503,12 @@ First a short overview of the available processing nodes:
|
|
|
448
503
|
Example: `openai(src: "de", dst: "en")`<br/>
|
|
449
504
|
Notice: this node requires an OpenAI API key!
|
|
450
505
|
|
|
506
|
+
> This node performs translation between English and German languages
|
|
507
|
+
> in the text stream or (if the source and destination language is
|
|
508
|
+
> the same) spellchecking of English or German languages in the text
|
|
509
|
+
> stream. It is based on the remote OpenAI cloud AI service and uses
|
|
510
|
+
> the GPT-4o-mini LLM.
|
|
511
|
+
|
|
451
512
|
| Port | Payload |
|
|
452
513
|
| ------- | ----------- |
|
|
453
514
|
| input | text |
|
|
@@ -464,7 +525,13 @@ First a short overview of the available processing nodes:
|
|
|
464
525
|
- Node: **ollama**<br/>
|
|
465
526
|
Purpose: **Ollama/Gemma Text-to-Text translation and spelling correction**<br/>
|
|
466
527
|
Example: `ollama(src: "de", dst: "en")`<br/>
|
|
467
|
-
Notice: this node requires
|
|
528
|
+
Notice: this node requires Ollama to be installed!
|
|
529
|
+
|
|
530
|
+
> This node performs translation between English and German languages
|
|
531
|
+
> in the text stream or (if the source and destination language is
|
|
532
|
+
> the same) spellchecking of English or German languages in the text
|
|
533
|
+
> stream. It is based on the local Ollama AI service and uses the
|
|
534
|
+
> Google Gemma 3 LLM.
|
|
468
535
|
|
|
469
536
|
| Port | Payload |
|
|
470
537
|
| ------- | ----------- |
|
|
@@ -482,6 +549,9 @@ First a short overview of the available processing nodes:
|
|
|
482
549
|
Purpose: **Transformers Text-to-Text translation**<br/>
|
|
483
550
|
Example: `transformers(src: "de", dst: "en")`<br/>
|
|
484
551
|
|
|
552
|
+
> This node performs translation between English and German languages
|
|
553
|
+
> in the text stream. It is based on local OPUS or SmolLM3 LLMs.
|
|
554
|
+
|
|
485
555
|
| Port | Payload |
|
|
486
556
|
| ------- | ----------- |
|
|
487
557
|
| input | text |
|
|
@@ -489,7 +559,7 @@ First a short overview of the available processing nodes:
|
|
|
489
559
|
|
|
490
560
|
| Parameter | Position | Default | Requirement |
|
|
491
561
|
| ------------ | --------- | -------- | ---------------- |
|
|
492
|
-
| **model** | *none* | "OPUS" | `/^(?:OPUS
|
|
562
|
+
| **model** | *none* | "OPUS" | `/^(?:OPUS\|SmolLM3)$/` |
|
|
493
563
|
| **src** | 0 | "de" | `/^(?:de\|en)$/` |
|
|
494
564
|
| **dst** | 1 | "en" | `/^(?:de\|en)$/` |
|
|
495
565
|
|
|
@@ -497,6 +567,11 @@ First a short overview of the available processing nodes:
|
|
|
497
567
|
Purpose: **sentence splitting/merging**<br/>
|
|
498
568
|
Example: `sentence()`<br/>
|
|
499
569
|
|
|
570
|
+
> This node allows you to ensure that a text stream is split or merged
|
|
571
|
+
> into complete sentences. It is primarily intended to be used after
|
|
572
|
+
> the "deepgram" node and before "deepl" or "elevenlabs" nodes in
|
|
573
|
+
> order to improve overall quality.
|
|
574
|
+
|
|
500
575
|
| Port | Payload |
|
|
501
576
|
| ------- | ----------- |
|
|
502
577
|
| input | text |
|
|
@@ -509,6 +584,9 @@ First a short overview of the available processing nodes:
|
|
|
509
584
|
Purpose: **SRT/VTT Subtitle Generation**<br/>
|
|
510
585
|
Example: `subtitle(format: "srt")`<br/>
|
|
511
586
|
|
|
587
|
+
> This node generates subtitles from the text stream (and its embedded
|
|
588
|
+
> timestamps) in the formats SRT (SubRip) or VTT (WebVTT).
|
|
589
|
+
|
|
512
590
|
| Port | Payload |
|
|
513
591
|
| ------- | ----------- |
|
|
514
592
|
| input | text |
|
|
@@ -517,11 +595,16 @@ First a short overview of the available processing nodes:
|
|
|
517
595
|
| Parameter | Position | Default | Requirement |
|
|
518
596
|
| ------------ | --------- | -------- | ------------------ |
|
|
519
597
|
| **format** | *none* | "srt" | /^(?:srt\|vtt)$/ |
|
|
598
|
+
| **words** | *none* | false | *none* |
|
|
520
599
|
|
|
521
600
|
- Node: **format**<br/>
|
|
522
601
|
Purpose: **text paragraph formatting**<br/>
|
|
523
602
|
Example: `format(width: 80)`<br/>
|
|
524
603
|
|
|
604
|
+
> This node formats the text stream into lines no longer than a
|
|
605
|
+
> certain width. It is primarily intended for use before writing text
|
|
606
|
+
> chunks to files.
|
|
607
|
+
|
|
525
608
|
| Port | Payload |
|
|
526
609
|
| ------- | ----------- |
|
|
527
610
|
| input | text |
|
|
@@ -531,29 +614,43 @@ First a short overview of the available processing nodes:
|
|
|
531
614
|
| ------------ | --------- | -------- | --------------------- |
|
|
532
615
|
| **width** | 0 | 80 | *none* |
|
|
533
616
|
|
|
534
|
-
### Text-to-Audio Nodes
|
|
617
|
+
### Text-to-Audio Nodes
|
|
618
|
+
|
|
619
|
+
The following nodes convert text chunks to audio chunks.
|
|
535
620
|
|
|
536
621
|
- Node: **elevenlabs**<br/>
|
|
537
622
|
Purpose: **ElevenLabs Text-to-Speech conversion**<br/>
|
|
538
623
|
Example: `elevenlabs(language: "en")`<br/>
|
|
539
|
-
Notice: this node requires an API key!
|
|
624
|
+
Notice: this node requires an ElevenLabs API key!
|
|
625
|
+
|
|
626
|
+
> This node perform Text-to-Speech (T2S) conversion, i.e., it converts
|
|
627
|
+
> the input text stream into an output audio stream. It is intended to
|
|
628
|
+
> generate speech.
|
|
540
629
|
|
|
541
630
|
| Port | Payload |
|
|
542
631
|
| ------- | ----------- |
|
|
543
632
|
| input | text |
|
|
544
633
|
| output | audio |
|
|
545
634
|
|
|
546
|
-
| Parameter
|
|
547
|
-
|
|
|
548
|
-
| **key**
|
|
549
|
-
| **voice**
|
|
550
|
-
| **language**
|
|
635
|
+
| Parameter | Position | Default | Requirement |
|
|
636
|
+
| -------------- | --------- | --------- | ------------------ |
|
|
637
|
+
| **key** | *none* | env.SPEECHFLOW\_ELEVENLABS\_KEY | *none* |
|
|
638
|
+
| **voice** | 0 | "Brian" | `/^(?:Brittney\|Cassidy\|Leonie\|Mark\|Brian)$/` |
|
|
639
|
+
| **language** | 1 | "de" | `/^(?:de\|en)$/` |
|
|
640
|
+
| **speed** | 2 | 1.00 | `n >= 0`7 && n <= 1.2` |
|
|
641
|
+
| **stability** | 3 | 0.5 | `n >= 0.0 && n <= 1.0` |
|
|
642
|
+
| **similarity** | 4 | 0.75 | `n >= 0.0 && n <= 1.0` |
|
|
643
|
+
| **optimize** | 5 | "latency" | `/^(?:latency\|quality)$/` |
|
|
551
644
|
|
|
552
645
|
- Node: **kokoro**<br/>
|
|
553
646
|
Purpose: **Kokoro Text-to-Speech conversion**<br/>
|
|
554
647
|
Example: `kokoro(language: "en")`<br/>
|
|
555
648
|
Notice: this currently support English language only!
|
|
556
649
|
|
|
650
|
+
> This node perform Text-to-Speech (T2S) conversion, i.e., it converts
|
|
651
|
+
> the input text stream into an output audio stream. It is intended to
|
|
652
|
+
> generate speech.
|
|
653
|
+
|
|
557
654
|
| Port | Payload |
|
|
558
655
|
| ------- | ----------- |
|
|
559
656
|
| input | text |
|
|
@@ -561,16 +658,23 @@ First a short overview of the available processing nodes:
|
|
|
561
658
|
|
|
562
659
|
| Parameter | Position | Default | Requirement |
|
|
563
660
|
| ------------ | --------- | -------- | ----------- |
|
|
564
|
-
| **voice** | 0 | "Aoede" | `/^(?:Aoede
|
|
661
|
+
| **voice** | 0 | "Aoede" | `/^(?:Aoede\|Heart\|Puck\|Fenrir)$/` |
|
|
565
662
|
| **language** | 1 | "en" | `/^en$/` |
|
|
566
663
|
| **speed** | 2 | 1.25 | 1.0...1.30 |
|
|
567
664
|
|
|
568
|
-
### Any-to-Any Nodes
|
|
665
|
+
### Any-to-Any Nodes
|
|
666
|
+
|
|
667
|
+
The following nodes process any type of chunk, i.e., both audio and text chunks.
|
|
569
668
|
|
|
570
669
|
- Node: **filter**<br/>
|
|
571
670
|
Purpose: **meta information based filter**<br/>
|
|
572
671
|
Example: `filter(type: "audio", var: "meta:gender", op: "==", val: "male")`<br/>
|
|
573
672
|
|
|
673
|
+
> This node allows you to filter nodes based on certain criteria. It
|
|
674
|
+
> is primarily intended to be used in conjunction with the "gender"
|
|
675
|
+
> node and in front of the `elevenlabs` or `kokoro` nodes in order to
|
|
676
|
+
> translate with a corresponding voice.
|
|
677
|
+
|
|
574
678
|
| Port | Payload |
|
|
575
679
|
| ------- | ----------- |
|
|
576
680
|
| input | text, audio |
|
|
@@ -580,14 +684,18 @@ First a short overview of the available processing nodes:
|
|
|
580
684
|
| ------------ | --------- | -------- | --------------------- |
|
|
581
685
|
| **type** | 0 | "audio" | `/^(?:audio\|text)$/` |
|
|
582
686
|
| **name** | 1 | "filter" | `/^.+$/` |
|
|
583
|
-
| **var** | 2 | "" | `/^(?:meta
|
|
584
|
-
| **op** | 3 | "==" | `/^(
|
|
687
|
+
| **var** | 2 | "" | `/^(?:meta:.+\|payload:(?:length\|text)\|time:(?:start\|end))$/` |
|
|
688
|
+
| **op** | 3 | "==" | `/^(?:<\|<=\|==\|!=\|~~\|!~\|>=\|>)$/` |
|
|
585
689
|
| **val** | 4 | "" | `/^.*$/` |
|
|
586
690
|
|
|
587
691
|
- Node: **trace**<br/>
|
|
588
692
|
Purpose: **data flow tracing**<br/>
|
|
589
693
|
Example: `trace(type: "audio")`<br/>
|
|
590
694
|
|
|
695
|
+
> This node allows you to trace the audio and text chunk flow through
|
|
696
|
+
> the **SpeechFlow** graph. It just passes through its chunks, but
|
|
697
|
+
> sends information about the chunks to the log.
|
|
698
|
+
|
|
591
699
|
| Port | Payload |
|
|
592
700
|
| ------- | ----------- |
|
|
593
701
|
| input | text, audio |
|
|
@@ -598,10 +706,45 @@ First a short overview of the available processing nodes:
|
|
|
598
706
|
| **type** | 0 | "audio" | `/^(?:audio\|text)$/` |
|
|
599
707
|
| **name** | 1 | *none* | *none* |
|
|
600
708
|
|
|
709
|
+
REST/WebSocket API
|
|
710
|
+
------------------
|
|
711
|
+
|
|
712
|
+
**SpeechFlow** has an externally exposed REST/WebSockets API which can
|
|
713
|
+
be used to control the nodes and to receive information from nodes.
|
|
714
|
+
For controlling a node you have three possibilities (illustrated by
|
|
715
|
+
controlling the mode of the "mute" node):
|
|
716
|
+
|
|
717
|
+
```sh
|
|
718
|
+
# use HTTP/REST/GET:
|
|
719
|
+
$ curl http://127.0.0.1:8484/api/COMMAND/mute/mode/silenced
|
|
720
|
+
```
|
|
721
|
+
|
|
722
|
+
```sh
|
|
723
|
+
# use HTTP/REST/POST:
|
|
724
|
+
$ curl -H "Content-type: application/json" \
|
|
725
|
+
--data '{ "request": "COMMAND", "node": "mute", "args": [ "mode", "silenced" ] }' \
|
|
726
|
+
http://127.0.0.1:8484/api
|
|
727
|
+
```
|
|
728
|
+
|
|
729
|
+
```sh
|
|
730
|
+
# use WebSockets:
|
|
731
|
+
$ wscat -c ws://127.0.0.1:8484/api \
|
|
732
|
+
> { "request": "COMMAND", "node": "mute", "args": [ "mode", "silenced" ] }
|
|
733
|
+
```
|
|
734
|
+
|
|
735
|
+
For receiving emitted information from nodes, you have to use the WebSockets
|
|
736
|
+
API (illustrated by the emitted information of the "meter" node):
|
|
737
|
+
|
|
738
|
+
```sh
|
|
739
|
+
# use WebSockets:
|
|
740
|
+
$ wscat -c ws://127.0.0.1:8484/api \
|
|
741
|
+
< { "response": "NOTIFY", "node": "meter", "args": [ "meter", "LUFS-S", -35.75127410888672 ] }
|
|
742
|
+
```
|
|
743
|
+
|
|
601
744
|
History
|
|
602
745
|
-------
|
|
603
746
|
|
|
604
|
-
**
|
|
747
|
+
**SpeechFlow**, as a technical cut-through, was initially created in
|
|
605
748
|
March 2024 for use in the msg Filmstudio context. It was later refined
|
|
606
749
|
into a more complete toolkit in April 2025 and this way the first time
|
|
607
750
|
could be used in production. It was fully refactored in July 2025 in
|