@uzen/kokoro-js 1.2.3 → 1.2.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +1 -1
- package/README-zh.md +144 -0
- package/README.md +95 -117
- package/package.json +12 -4
package/LICENSE
CHANGED
|
@@ -186,7 +186,7 @@
|
|
|
186
186
|
same "printed page" as the copyright notice for easier
|
|
187
187
|
identification within third-party archives.
|
|
188
188
|
|
|
189
|
-
Copyright
|
|
189
|
+
Copyright 2026 uzen-zone
|
|
190
190
|
|
|
191
191
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
192
192
|
you may not use this file except in compliance with the License.
|
package/README-zh.md
ADDED
|
@@ -0,0 +1,144 @@
|
|
|
1
|
+
# Kokoro TTS
|
|
2
|
+
|
|
3
|
+
<p align="center">
|
|
4
|
+
<a href="https://www.npmjs.com/package/@uzen/kokoro-js"><img alt="NPM" src="https://img.shields.io/npm/v/@uzen%2Fkokoro-js"></a>
|
|
5
|
+
<a href="https://www.npmjs.com/package/@uzen/kokoro-js"><img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@uzen%2Fkokoro-js"></a>
|
|
6
|
+
<a href="https://www.jsdelivr.com/package/npm/@uzen/kokoro-js"><img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@uzen%2Fkokoro-js"></a>
|
|
7
|
+
<a href="https://www.npmjs.com/package/@uzen/kokoro-js"><img alt="License" src="https://img.shields.io/npm/l/@uzen%2Fkokoro-js?color=blue"></a>
|
|
8
|
+
</p>
|
|
9
|
+
|
|
10
|
+
语言:[English](./README.md) | 简体中文
|
|
11
|
+
|
|
12
|
+
`@uzen/kokoro-js` 是基于 [Transformers.js](https://huggingface.co/docs/transformers.js) 的 Kokoro TTS JavaScript 运行时封装。当前版本面向 v1.1 中文模型接线,内置中文音素化流程,支持普通话以及中英混合文本。
|
|
13
|
+
|
|
14
|
+
## 目标
|
|
15
|
+
|
|
16
|
+
- 为 `kokoro.js` 增加浏览器本地的普通话和中英混合 TTS 支持。
|
|
17
|
+
- 以 Python `kokoro` 和 `misaki` v1.1 中文前端作为中文音素化、文本归一化、变调和儿化行为参考。
|
|
18
|
+
- 保持 JavaScript 包运行时自包含且适合浏览器使用;Python 只作为行为参考,不作为运行时依赖。
|
|
19
|
+
- 面向 v1.1 中文 ONNX 模型、本地 voice 资产和 WebGPU 优先路径,提供可实际部署的 Web 推理体验。
|
|
20
|
+
- 中英混合文本中的英文片段保留可读输出,走现有英文音素化路径,而不是丢弃或替换为未知符号。
|
|
21
|
+
|
|
22
|
+
## 当前状态
|
|
23
|
+
|
|
24
|
+
- 中文音素化会直接输出 v1.1 中文 tokenizer 符号,不再回落到英文 espeak 音素化。
|
|
25
|
+
- 已覆盖常见普通话文本归一化场景,包括时间、日期、数字范围、分数、负数、百分比、电话号码、温度、度量衡、金额和量词数字。
|
|
26
|
+
- 已实现一批声调相关规则,包括常见轻声词和粒子、叠词轻声、`一`/`不` 变调、三声变调和儿化处理。
|
|
27
|
+
- Golden corpus 以 Python `misaki` v1.1 输出作为参考;当前计划记录 164 条样例,其中 143 条 match,21 条已知 gap。
|
|
28
|
+
- 剩余 gap 主要来自 `Intl.Segmenter` 与 `jieba.posseg` 的分词差异、浏览器路径缺少 POS 信息,以及长尾文本归一化规则尚未完全覆盖。
|
|
29
|
+
|
|
30
|
+
## 特点
|
|
31
|
+
|
|
32
|
+
- 通过 Transformers.js 在浏览器和 Node.js 中运行 Kokoro TTS。
|
|
33
|
+
- 支持 v1.1 中文 ONNX 模型:`onnx-community/Kokoro-82M-v1.1-zh-ONNX`。
|
|
34
|
+
- 支持普通话文本和中英混合文本,中文会走显式中文音素化路径。
|
|
35
|
+
- 通过 `voicePath` 加载本地 voice 文件;npm 包不内置 voice `.bin` 文件。
|
|
36
|
+
- 支持 `tts.generate()` 一次性生成,也支持 `tts.stream()` 分段生成。
|
|
37
|
+
|
|
38
|
+
## 待优化项
|
|
39
|
+
|
|
40
|
+
- 推理前需要应用自行托管或复制 voice `.bin` 文件。
|
|
41
|
+
- 当前注册的英文 voice 只有 `af_maple`、`af_sol` 和 `bf_vale`。
|
|
42
|
+
- 中文音素化由 JavaScript 实现,长尾多音字或文本归一化场景可能仍与 Python `misaki` 有差异。
|
|
43
|
+
- 浏览器性能依赖 WebGPU 可用性;只有在 `webgpu` 不可用时才建议回退到 `wasm`。
|
|
44
|
+
- 推荐使用 `fp32`、`fp16` 和 `q4f16`;`q8`、`q4` 虽然可由加载器支持,但不推荐用于该模型。
|
|
45
|
+
|
|
46
|
+
## 安装
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
npm i @uzen/kokoro-js
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
安装本包时会同时安装运行依赖,包括 `@huggingface/transformers`。Transformers.js 会进一步带入 ONNX Runtime 相关传递依赖,例如 `onnxruntime-web`;应用不需要单独安装这些包。
|
|
53
|
+
|
|
54
|
+
## 模型和声音
|
|
55
|
+
|
|
56
|
+
使用 v1.1 中文 ONNX 模型:
|
|
57
|
+
|
|
58
|
+
```txt
|
|
59
|
+
onnx-community/Kokoro-82M-v1.1-zh-ONNX
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
voice `.bin` 文件不会打包进 npm 包。请从模型仓库下载需要的 voice 文件,并通过 `voicePath` 暴露给运行时。
|
|
63
|
+
|
|
64
|
+
- 浏览器/Vite 应用:把 `zf_001.bin` 这类文件放到 `public/kokoro/voices/`;默认 `voicePath` 是 `/kokoro/voices`。
|
|
65
|
+
- Node.js:把 voice 文件放到本地目录,并把该目录作为 `voicePath` 传入,例如 `voicePath: "./voices"`。
|
|
66
|
+
|
|
67
|
+
当前注册的 voice 包括 v1.1 中文 voice、两个美式英语 voice 和一个英式英语 voice:
|
|
68
|
+
|
|
69
|
+
```txt
|
|
70
|
+
af_maple, af_sol, bf_vale
|
|
71
|
+
zf_001, zf_002, zf_003, zf_004, zf_005, zf_006, zf_007, zf_008,
|
|
72
|
+
zf_017, zf_018, zf_019, zf_021, zf_022, zf_023, zf_024, zf_026,
|
|
73
|
+
zf_027, zf_028, zf_032, zf_036, zf_038, zf_039, zf_040, zf_042,
|
|
74
|
+
zf_043, zf_044, zf_046, zf_047, zf_048, zf_049, zf_051, zf_059,
|
|
75
|
+
zf_060, zf_067, zf_070, zf_071, zf_072, zf_073, zf_074, zf_075,
|
|
76
|
+
zf_076, zf_077, zf_078, zf_079, zf_083, zf_084, zf_085, zf_086,
|
|
77
|
+
zf_087, zf_088, zf_090, zf_092, zf_093, zf_094, zf_099,
|
|
78
|
+
zm_009, zm_010, zm_011, zm_012, zm_013, zm_014, zm_015, zm_016,
|
|
79
|
+
zm_020, zm_025, zm_029, zm_030, zm_031, zm_033, zm_034, zm_035,
|
|
80
|
+
zm_037, zm_041, zm_045, zm_050, zm_052, zm_053, zm_054, zm_055,
|
|
81
|
+
zm_056, zm_057, zm_058, zm_061, zm_062, zm_063, zm_064, zm_065,
|
|
82
|
+
zm_066, zm_068, zm_069, zm_080, zm_081, zm_082, zm_089, zm_091,
|
|
83
|
+
zm_095, zm_096, zm_097, zm_098, zm_100
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
运行时请以 `tts.list_voices()` 或 `tts.voices` 返回的结果为准。
|
|
87
|
+
|
|
88
|
+
## 生成语音
|
|
89
|
+
|
|
90
|
+
```js
|
|
91
|
+
import { KokoroTTS } from "@uzen/kokoro-js";
|
|
92
|
+
|
|
93
|
+
const model_id = "onnx-community/Kokoro-82M-v1.1-zh-ONNX";
|
|
94
|
+
const tts = await KokoroTTS.from_pretrained(model_id, {
|
|
95
|
+
dtype: "fp32",
|
|
96
|
+
device: "webgpu",
|
|
97
|
+
voicePath: "/kokoro/voices",
|
|
98
|
+
});
|
|
99
|
+
|
|
100
|
+
const audio = await tts.generate("你好,欢迎使用 Kokoro 中文语音。", {
|
|
101
|
+
voice: "zf_001",
|
|
102
|
+
speed: 1,
|
|
103
|
+
});
|
|
104
|
+
|
|
105
|
+
audio.save("audio.wav");
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
默认推荐使用 `"fp32"` 作为 `dtype`,浏览器默认推荐使用 `"webgpu"` 作为 `device`。推荐的 `dtype` 是 `"fp32"`、`"fp16"` 和 `"q4f16"`;`"q8"`、`"q4"` 等量化版本虽然可由加载器支持,但不推荐用于该模型。支持的 `device` 包括 `"webgpu"`、`"wasm"`、`"cpu"` 或 `null`;浏览器优先使用 `"webgpu"`,不可用时回退到 `"wasm"`,Node.js 使用 `"cpu"`。
|
|
109
|
+
|
|
110
|
+
## 流式生成
|
|
111
|
+
|
|
112
|
+
```js
|
|
113
|
+
import { KokoroTTS, TextSplitterStream } from "@uzen/kokoro-js";
|
|
114
|
+
|
|
115
|
+
const model_id = "onnx-community/Kokoro-82M-v1.1-zh-ONNX";
|
|
116
|
+
const tts = await KokoroTTS.from_pretrained(model_id, {
|
|
117
|
+
dtype: "fp32",
|
|
118
|
+
device: "webgpu",
|
|
119
|
+
});
|
|
120
|
+
|
|
121
|
+
const splitter = new TextSplitterStream();
|
|
122
|
+
const stream = tts.stream(splitter, { voice: "zf_001" });
|
|
123
|
+
|
|
124
|
+
(async () => {
|
|
125
|
+
let index = 0;
|
|
126
|
+
for await (const { text, phonemes, audio } of stream) {
|
|
127
|
+
console.log({ text, phonemes });
|
|
128
|
+
audio.save(`audio-${index++}.wav`);
|
|
129
|
+
}
|
|
130
|
+
})();
|
|
131
|
+
|
|
132
|
+
splitter.push("Kokoro 支持中文和中英 mixed text。它可以分段生成音频。");
|
|
133
|
+
splitter.close();
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
`tts.stream()` 也可以直接接收字符串。默认会使用 `TextSplitterStream` 进行句子切分,并对过长片段按标点继续切分。
|
|
137
|
+
|
|
138
|
+
## API 说明
|
|
139
|
+
|
|
140
|
+
- `KokoroTTS.from_pretrained(model_id, options)` 通过 Transformers.js 加载模型和 tokenizer。
|
|
141
|
+
- `tts.generate(text, { voice, speed })` 返回 Transformers.js 的 `RawAudio` 对象。
|
|
142
|
+
- `tts.stream(textOrSplitter, options)` 为每个生成片段产出 `{ text, phonemes, audio }`。
|
|
143
|
+
- `voicePath` 是基础路径,不是完整文件路径;浏览器会加载 `${voicePath}/${voice}.bin`,Node.js 会按同样模式从本地文件系统解析。
|
|
144
|
+
- `zf_` 和 `zm_` 开头的中文 voice 使用中文音素化路径;`af_` 和 `bf_` 开头的英文 voice 使用英文音素化路径。
|
package/README.md
CHANGED
|
@@ -4,163 +4,141 @@
|
|
|
4
4
|
<a href="https://www.npmjs.com/package/@uzen/kokoro-js"><img alt="NPM" src="https://img.shields.io/npm/v/@uzen%2Fkokoro-js"></a>
|
|
5
5
|
<a href="https://www.npmjs.com/package/@uzen/kokoro-js"><img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@uzen%2Fkokoro-js"></a>
|
|
6
6
|
<a href="https://www.jsdelivr.com/package/npm/@uzen/kokoro-js"><img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@uzen%2Fkokoro-js"></a>
|
|
7
|
-
<a href="https://
|
|
8
|
-
<a href="https://huggingface.co/spaces/webml-community/kokoro-webgpu"><img alt="Demo" src="https://img.shields.io/badge/Hugging_Face-demo-green"></a>
|
|
7
|
+
<a href="https://www.npmjs.com/package/@uzen/kokoro-js"><img alt="License" src="https://img.shields.io/npm/l/@uzen%2Fkokoro-js?color=blue"></a>
|
|
9
8
|
</p>
|
|
10
9
|
|
|
11
|
-
|
|
10
|
+
Language: English | [简体中文](./README-zh.md)
|
|
12
11
|
|
|
13
|
-
|
|
12
|
+
`@uzen/kokoro-js` is a JavaScript runtime wrapper for Kokoro TTS based on [Transformers.js](https://huggingface.co/docs/transformers.js). This fork is wired for the v1.1 Chinese model and supports Mandarin plus mixed Chinese/English text through the built-in phonemizer.
|
|
14
13
|
|
|
15
|
-
|
|
14
|
+
## Goals
|
|
15
|
+
|
|
16
|
+
- Add browser-local Mandarin and mixed Chinese/English TTS support to `kokoro.js`.
|
|
17
|
+
- Use Python `kokoro` plus `misaki` v1.1 Chinese frontend behavior as the reference for Chinese phonemization, normalization, tone sandhi, and erhua handling.
|
|
18
|
+
- Keep the JavaScript package self-contained and browser-friendly at runtime; Python is used only as a behavior reference, not as a runtime dependency.
|
|
19
|
+
- Target practical web inference with the v1.1 Chinese ONNX model, local voice assets, and a WebGPU-first default path.
|
|
20
|
+
- Preserve usable English spans in mixed text by routing them through the existing English phonemizer path instead of dropping them.
|
|
21
|
+
|
|
22
|
+
## Current Status
|
|
23
|
+
|
|
24
|
+
- Chinese phonemization now emits v1.1 Chinese tokenizer symbols directly instead of falling back to English espeak phonemization.
|
|
25
|
+
- The implementation covers common Mandarin normalization cases including time expressions, dates, numeric ranges, fractions, negative numbers, percentages, phone numbers, temperatures, measurements, money, and quantified numbers.
|
|
26
|
+
- Tone-related handling includes selected neutral-tone words and particles, reduplication, `一`/`不` sandhi, third-tone sandhi, and erhua handling.
|
|
27
|
+
- Golden corpus tracking is based on Python `misaki` v1.1 outputs; the current plan records 164 examples with 143 matches and 21 known gaps.
|
|
28
|
+
- Remaining gaps mainly come from `Intl.Segmenter` versus `jieba.posseg` segmentation differences, missing POS information in the browser path, and long-tail text normalization cases.
|
|
29
|
+
|
|
30
|
+
## Features
|
|
31
|
+
|
|
32
|
+
- Runs Kokoro TTS in browser and Node.js through Transformers.js.
|
|
33
|
+
- Supports the v1.1 Chinese ONNX model: `onnx-community/Kokoro-82M-v1.1-zh-ONNX`.
|
|
34
|
+
- Supports Mandarin text and mixed Chinese/English text with explicit Chinese phonemization.
|
|
35
|
+
- Provides local voice loading through `voicePath`; voice `.bin` files are not bundled in the npm package.
|
|
36
|
+
- Supports single-shot generation through `tts.generate()` and chunked generation through `tts.stream()`.
|
|
37
|
+
|
|
38
|
+
## Known Limitations
|
|
39
|
+
|
|
40
|
+
- Voice assets must be hosted or copied by the application before inference can run.
|
|
41
|
+
- The registered English voice set is limited to `af_maple`, `af_sol`, and `bf_vale`.
|
|
42
|
+
- Chinese phonemization is implemented in JavaScript and may still differ from Python `misaki` on long-tail polyphones or text normalization cases.
|
|
43
|
+
- Browser performance depends on WebGPU availability; use `wasm` only as a fallback when `webgpu` is unavailable.
|
|
44
|
+
- `fp32`, `fp16`, and `q4f16` are recommended; `q8` and `q4` are supported by the loader but not recommended for this model.
|
|
45
|
+
|
|
46
|
+
## Install
|
|
16
47
|
|
|
17
48
|
```bash
|
|
18
49
|
npm i @uzen/kokoro-js
|
|
19
50
|
```
|
|
20
51
|
|
|
21
|
-
|
|
52
|
+
Installing this package also installs its runtime dependencies, including `@huggingface/transformers`. Transformers.js brings in ONNX Runtime packages such as `onnxruntime-web` as transitive dependencies; applications do not need to install them separately.
|
|
22
53
|
|
|
23
|
-
|
|
24
|
-
- Node.js: place the files in a local directory and pass `voicePath`, for example `voicePath: "./voices"`.
|
|
54
|
+
## Model And Voices
|
|
25
55
|
|
|
26
|
-
|
|
56
|
+
Use the v1.1 Chinese ONNX model:
|
|
57
|
+
|
|
58
|
+
```txt
|
|
59
|
+
onnx-community/Kokoro-82M-v1.1-zh-ONNX
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
Voice `.bin` files are not bundled in the npm package. Download the voice files you need from the model repository and expose them through `voicePath`.
|
|
63
|
+
|
|
64
|
+
- Browser/Vite apps: put files such as `zf_001.bin` under `public/kokoro/voices/`; the default `voicePath` is `/kokoro/voices`.
|
|
65
|
+
- Node.js: put the files in a local directory and pass that directory as `voicePath`, for example `voicePath: "./voices"`.
|
|
66
|
+
|
|
67
|
+
Current registered voices are the v1.1 Chinese voices plus two American English voices and one British English voice:
|
|
68
|
+
|
|
69
|
+
```txt
|
|
70
|
+
af_maple, af_sol, bf_vale
|
|
71
|
+
zf_001, zf_002, zf_003, zf_004, zf_005, zf_006, zf_007, zf_008,
|
|
72
|
+
zf_017, zf_018, zf_019, zf_021, zf_022, zf_023, zf_024, zf_026,
|
|
73
|
+
zf_027, zf_028, zf_032, zf_036, zf_038, zf_039, zf_040, zf_042,
|
|
74
|
+
zf_043, zf_044, zf_046, zf_047, zf_048, zf_049, zf_051, zf_059,
|
|
75
|
+
zf_060, zf_067, zf_070, zf_071, zf_072, zf_073, zf_074, zf_075,
|
|
76
|
+
zf_076, zf_077, zf_078, zf_079, zf_083, zf_084, zf_085, zf_086,
|
|
77
|
+
zf_087, zf_088, zf_090, zf_092, zf_093, zf_094, zf_099,
|
|
78
|
+
zm_009, zm_010, zm_011, zm_012, zm_013, zm_014, zm_015, zm_016,
|
|
79
|
+
zm_020, zm_025, zm_029, zm_030, zm_031, zm_033, zm_034, zm_035,
|
|
80
|
+
zm_037, zm_041, zm_045, zm_050, zm_052, zm_053, zm_054, zm_055,
|
|
81
|
+
zm_056, zm_057, zm_058, zm_061, zm_062, zm_063, zm_064, zm_065,
|
|
82
|
+
zm_066, zm_068, zm_069, zm_080, zm_081, zm_082, zm_089, zm_091,
|
|
83
|
+
zm_095, zm_096, zm_097, zm_098, zm_100
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
Call `tts.list_voices()` or read `tts.voices` at runtime for the authoritative list from the installed package.
|
|
87
|
+
|
|
88
|
+
## Generate Speech
|
|
27
89
|
|
|
28
90
|
```js
|
|
29
91
|
import { KokoroTTS } from "@uzen/kokoro-js";
|
|
30
92
|
|
|
31
93
|
const model_id = "onnx-community/Kokoro-82M-v1.1-zh-ONNX";
|
|
32
94
|
const tts = await KokoroTTS.from_pretrained(model_id, {
|
|
33
|
-
dtype: "
|
|
34
|
-
device: "
|
|
35
|
-
|
|
95
|
+
dtype: "fp32",
|
|
96
|
+
device: "webgpu",
|
|
97
|
+
voicePath: "/kokoro/voices",
|
|
36
98
|
});
|
|
37
99
|
|
|
38
|
-
const
|
|
39
|
-
const audio = await tts.generate(text, {
|
|
40
|
-
// Use `tts.list_voices()` to list all available voices
|
|
100
|
+
const audio = await tts.generate("你好,欢迎使用 Kokoro 中文语音。", {
|
|
41
101
|
voice: "zf_001",
|
|
102
|
+
speed: 1,
|
|
42
103
|
});
|
|
104
|
+
|
|
43
105
|
audio.save("audio.wav");
|
|
44
106
|
```
|
|
45
107
|
|
|
46
|
-
|
|
108
|
+
Use `"fp32"` as the default `dtype` and `"webgpu"` as the default browser `device`. Recommended `dtype` values are `"fp32"`, `"fp16"`, and `"q4f16"`. Other supported quantized variants such as `"q8"` and `"q4"` are not recommended for this model. Supported `device` values are `"webgpu"`, `"wasm"`, `"cpu"`, or `null`; browser apps should prefer `"webgpu"` when available and can fall back to `"wasm"`, while Node.js uses `"cpu"`.
|
|
109
|
+
|
|
110
|
+
## Stream Speech
|
|
47
111
|
|
|
48
112
|
```js
|
|
49
113
|
import { KokoroTTS, TextSplitterStream } from "@uzen/kokoro-js";
|
|
50
114
|
|
|
51
115
|
const model_id = "onnx-community/Kokoro-82M-v1.1-zh-ONNX";
|
|
52
116
|
const tts = await KokoroTTS.from_pretrained(model_id, {
|
|
53
|
-
dtype: "fp32",
|
|
54
|
-
|
|
117
|
+
dtype: "fp32",
|
|
118
|
+
device: "webgpu",
|
|
55
119
|
});
|
|
56
120
|
|
|
57
|
-
// First, set up the stream
|
|
58
121
|
const splitter = new TextSplitterStream();
|
|
59
122
|
const stream = tts.stream(splitter, { voice: "zf_001" });
|
|
123
|
+
|
|
60
124
|
(async () => {
|
|
61
|
-
let
|
|
125
|
+
let index = 0;
|
|
62
126
|
for await (const { text, phonemes, audio } of stream) {
|
|
63
127
|
console.log({ text, phonemes });
|
|
64
|
-
audio.save(`audio-${
|
|
128
|
+
audio.save(`audio-${index++}.wav`);
|
|
65
129
|
}
|
|
66
130
|
})();
|
|
67
131
|
|
|
68
|
-
|
|
69
|
-
// For this example, let's pretend we're consuming text from an LLM, one word at a time.
|
|
70
|
-
const text = "Kokoro 是一个轻量级本地语音模型,支持中文和中英混合文本。它可以在浏览器里通过 Transformers.js 运行,也可以在 Node.js 中生成音频文件。";
|
|
71
|
-
const tokens = text.match(/\s*\S+/g);
|
|
72
|
-
for (const token of tokens) {
|
|
73
|
-
splitter.push(token);
|
|
74
|
-
await new Promise((resolve) => setTimeout(resolve, 10));
|
|
75
|
-
}
|
|
76
|
-
|
|
77
|
-
// Finally, close the stream to signal that no more text will be added.
|
|
132
|
+
splitter.push("Kokoro 支持中文和中英 mixed text。它可以分段生成音频。");
|
|
78
133
|
splitter.close();
|
|
79
|
-
|
|
80
|
-
// Alternatively, if you'd like to keep the stream open, but flush any remaining text, you can use the `flush` method.
|
|
81
|
-
// splitter.flush();
|
|
82
134
|
```
|
|
83
135
|
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
| **af_heart** | 🚺❤️ | | | **A** |
|
|
94
|
-
| af_alloy | 🚺 | B | MM minutes | C |
|
|
95
|
-
| af_aoede | 🚺 | B | H hours | C+ |
|
|
96
|
-
| af_bella | 🚺🔥 | **A** | **HH hours** | **A-** |
|
|
97
|
-
| af_jessica | 🚺 | C | MM minutes | D |
|
|
98
|
-
| af_kore | 🚺 | B | H hours | C+ |
|
|
99
|
-
| af_nicole | 🚺🎧 | B | **HH hours** | B- |
|
|
100
|
-
| af_nova | 🚺 | B | MM minutes | C |
|
|
101
|
-
| af_river | 🚺 | C | MM minutes | D |
|
|
102
|
-
| af_sarah | 🚺 | B | H hours | C+ |
|
|
103
|
-
| af_sky | 🚺 | B | _M minutes_ 🤏 | C- |
|
|
104
|
-
| am_adam | 🚹 | D | H hours | F+ |
|
|
105
|
-
| am_echo | 🚹 | C | MM minutes | D |
|
|
106
|
-
| am_eric | 🚹 | C | MM minutes | D |
|
|
107
|
-
| am_fenrir | 🚹 | B | H hours | C+ |
|
|
108
|
-
| am_liam | 🚹 | C | MM minutes | D |
|
|
109
|
-
| am_michael | 🚹 | B | H hours | C+ |
|
|
110
|
-
| am_onyx | 🚹 | C | MM minutes | D |
|
|
111
|
-
| am_puck | 🚹 | B | H hours | C+ |
|
|
112
|
-
| am_santa | 🚹 | C | _M minutes_ 🤏 | D- |
|
|
113
|
-
|
|
114
|
-
### Chinese (Mandarin) — v1.1 zh voices
|
|
115
|
-
|
|
116
|
-
| Name | Gender | Name | Gender | Name | Gender |
|
|
117
|
-
| -------- | ------ | ------- | ------ | ------- | ------ |
|
|
118
|
-
| zf_001 | 🚺 | zf_046 | 🚺 | zm_052 | 🚹 |
|
|
119
|
-
| zf_002 | 🚺 | zf_047 | 🚺 | zm_053 | 🚹 |
|
|
120
|
-
| zf_003 | 🚺 | zf_048 | 🚺 | zm_054 | 🚹 |
|
|
121
|
-
| zf_004 | 🚺 | zf_049 | 🚺 | zm_055 | 🚹 |
|
|
122
|
-
| zf_005 | 🚺 | zf_051 | 🚺 | zm_056 | 🚹 |
|
|
123
|
-
| zf_006 | 🚺 | zf_059 | 🚺 | zm_057 | 🚹 |
|
|
124
|
-
| zf_007 | 🚺 | zf_060 | 🚺 | zm_058 | 🚹 |
|
|
125
|
-
| zf_008 | 🚺 | zf_067 | 🚺 | zm_061 | 🚹 |
|
|
126
|
-
| zf_017 | 🚺 | zf_070 | 🚺 | zm_062 | 🚹 |
|
|
127
|
-
| zf_018 | 🚺 | zf_071 | 🚺 | zm_063 | 🚹 |
|
|
128
|
-
| zf_019 | 🚺 | zf_072 | 🚺 | zm_064 | 🚹 |
|
|
129
|
-
| zf_021 | 🚺 | zf_073 | 🚺 | zm_065 | 🚹 |
|
|
130
|
-
| zf_022 | 🚺 | zf_074 | 🚺 | zm_066 | 🚹 |
|
|
131
|
-
| zf_023 | 🚺 | zf_075 | 🚺 | zm_068 | 🚹 |
|
|
132
|
-
| zf_024 | 🚺 | zf_076 | 🚺 | zm_069 | 🚹 |
|
|
133
|
-
| zf_026 | 🚺 | zf_077 | 🚺 | zm_080 | 🚹 |
|
|
134
|
-
| zf_027 | 🚺 | zf_078 | 🚺 | zm_081 | 🚹 |
|
|
135
|
-
| zf_028 | 🚺 | zf_079 | 🚺 | zm_082 | 🚹 |
|
|
136
|
-
| zf_032 | 🚺 | zf_083 | 🚺 | zm_089 | 🚹 |
|
|
137
|
-
| zf_036 | 🚺 | zf_084 | 🚺 | zm_091 | 🚹 |
|
|
138
|
-
| zf_038 | 🚺 | zf_085 | 🚺 | zm_095 | 🚹 |
|
|
139
|
-
| zf_039 | 🚺 | zf_086 | 🚺 | zm_096 | 🚹 |
|
|
140
|
-
| zf_040 | 🚺 | zf_087 | 🚺 | zm_097 | 🚹 |
|
|
141
|
-
| zf_042 | 🚺 | zf_088 | 🚺 | zm_098 | 🚹 |
|
|
142
|
-
| zf_043 | 🚺 | zf_090 | 🚺 | zm_100 | 🚹 |
|
|
143
|
-
| zf_044 | 🚺 | zf_092 | 🚺 | zm_009 | 🚹 |
|
|
144
|
-
| zf_092 | 🚺 | zf_093 | 🚺 | zm_010 | 🚹 |
|
|
145
|
-
| zf_094 | 🚺 | zf_099 | 🚺 | zm_011 | 🚹 |
|
|
146
|
-
| zm_012 | 🚹 | zm_020 | 🚹 | zm_029 | 🚹 |
|
|
147
|
-
| zm_013 | 🚹 | zm_025 | 🚹 | zm_030 | 🚹 |
|
|
148
|
-
| zm_014 | 🚹 | zm_031 | 🚹 | zm_033 | 🚹 |
|
|
149
|
-
| zm_015 | 🚹 | zm_034 | 🚹 | zm_035 | 🚹 |
|
|
150
|
-
| zm_016 | 🚹 | zm_037 | 🚹 | zm_041 | 🚹 |
|
|
151
|
-
| zm_045 | 🚹 | zm_050 | 🚹 | | |
|
|
152
|
-
|
|
153
|
-
Use with v1.1 zh model: `onnx-community/Kokoro-82M-v1.1-zh-ONNX`.
|
|
154
|
-
|
|
155
|
-
### British English
|
|
156
|
-
|
|
157
|
-
| Name | Traits | Target Quality | Training Duration | Overall Grade |
|
|
158
|
-
| ----------- | ------ | -------------- | ----------------- | ------------- |
|
|
159
|
-
| bf_alice | 🚺 | C | MM minutes | D |
|
|
160
|
-
| bf_emma | 🚺 | B | **HH hours** | B- |
|
|
161
|
-
| bf_isabella | 🚺 | B | MM minutes | C |
|
|
162
|
-
| bf_lily | 🚺 | C | MM minutes | D |
|
|
163
|
-
| bm_daniel | 🚹 | C | MM minutes | D |
|
|
164
|
-
| bm_fable | 🚹 | B | MM minutes | C |
|
|
165
|
-
| bm_george | 🚹 | B | MM minutes | C |
|
|
166
|
-
| bm_lewis | 🚹 | C | H hours | D+ |
|
|
136
|
+
`tts.stream()` also accepts a string directly. By default it uses `TextSplitterStream` sentence splitting and further splits long chunks at punctuation boundaries.
|
|
137
|
+
|
|
138
|
+
## API Notes
|
|
139
|
+
|
|
140
|
+
- `KokoroTTS.from_pretrained(model_id, options)` loads the model and tokenizer through Transformers.js.
|
|
141
|
+
- `tts.generate(text, { voice, speed })` returns a Transformers.js `RawAudio` object.
|
|
142
|
+
- `tts.stream(textOrSplitter, options)` yields `{ text, phonemes, audio }` for each generated chunk.
|
|
143
|
+
- `voicePath` is a base path, not a full file path; the runtime loads `${voicePath}/${voice}.bin` in browsers and resolves the same pattern from the local filesystem in Node.js.
|
|
144
|
+
- Chinese voices begin with `zf_` or `zm_` and use the Chinese phonemizer path. English voices beginning with `af_` or `bf_` use the English phonemizer path.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@uzen/kokoro-js",
|
|
3
|
-
"version": "1.2.
|
|
3
|
+
"version": "1.2.4",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"exports": {
|
|
6
6
|
"types": "./types/kokoro.d.ts",
|
|
@@ -11,7 +11,7 @@
|
|
|
11
11
|
"default": "./dist/kokoro.js"
|
|
12
12
|
},
|
|
13
13
|
"scripts": {
|
|
14
|
-
"build": "node -e \"fs.rmSync('dist',{recursive:true,force:true});fs.rmSync('types',{recursive:true,force:true})\" && rollup -c && tsc
|
|
14
|
+
"build": "node -e \"fs.rmSync('dist',{recursive:true,force:true});fs.rmSync('types',{recursive:true,force:true})\" && rollup -c && tsc",
|
|
15
15
|
"format": "prettier --write . --print-width 1000",
|
|
16
16
|
"test": "vitest run"
|
|
17
17
|
},
|
|
@@ -30,13 +30,20 @@
|
|
|
30
30
|
},
|
|
31
31
|
"contributors": [
|
|
32
32
|
"Xenova",
|
|
33
|
-
"uzen
|
|
33
|
+
"uzen-zone"
|
|
34
34
|
],
|
|
35
35
|
"license": "Apache-2.0",
|
|
36
36
|
"description": "High-quality text-to-speech for the web",
|
|
37
|
+
"homepage": "https://github.com/uzen-zone/kokoro-js#readme",
|
|
38
|
+
"repository": {
|
|
39
|
+
"type": "git",
|
|
40
|
+
"url": "git+https://github.com/uzen-zone/kokoro-js.git"
|
|
41
|
+
},
|
|
42
|
+
"bugs": {
|
|
43
|
+
"url": "https://github.com/uzen-zone/kokoro-js/issues"
|
|
44
|
+
},
|
|
37
45
|
"dependencies": {
|
|
38
46
|
"@huggingface/transformers": "^4.2.0",
|
|
39
|
-
"phonemize": "^1.2.0",
|
|
40
47
|
"phonemizer": "^1.2.1",
|
|
41
48
|
"pinyin-pro": "^3.28.1"
|
|
42
49
|
},
|
|
@@ -52,6 +59,7 @@
|
|
|
52
59
|
"types",
|
|
53
60
|
"dist",
|
|
54
61
|
"README.md",
|
|
62
|
+
"README-zh.md",
|
|
55
63
|
"LICENSE"
|
|
56
64
|
],
|
|
57
65
|
"publishConfig": {
|