@uzen/kokoro-js 1.2.3 → 1.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/LICENSE +1 -1
  2. package/README-zh.md +144 -0
  3. package/README.md +95 -117
  4. package/package.json +12 -4
package/LICENSE CHANGED
@@ -186,7 +186,7 @@
186
186
  same "printed page" as the copyright notice for easier
187
187
  identification within third-party archives.
188
188
 
189
- Copyright [yyyy] [name of copyright owner]
189
+ Copyright 2026 uzen-zone
190
190
 
191
191
  Licensed under the Apache License, Version 2.0 (the "License");
192
192
  you may not use this file except in compliance with the License.
package/README-zh.md ADDED
@@ -0,0 +1,144 @@
1
+ # Kokoro TTS
2
+
3
+ <p align="center">
4
+ <a href="https://www.npmjs.com/package/@uzen/kokoro-js"><img alt="NPM" src="https://img.shields.io/npm/v/@uzen%2Fkokoro-js"></a>
5
+ <a href="https://www.npmjs.com/package/@uzen/kokoro-js"><img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@uzen%2Fkokoro-js"></a>
6
+ <a href="https://www.jsdelivr.com/package/npm/@uzen/kokoro-js"><img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@uzen%2Fkokoro-js"></a>
7
+ <a href="https://www.npmjs.com/package/@uzen/kokoro-js"><img alt="License" src="https://img.shields.io/npm/l/@uzen%2Fkokoro-js?color=blue"></a>
8
+ </p>
9
+
10
+ 语言:[English](./README.md) | 简体中文
11
+
12
+ `@uzen/kokoro-js` 是基于 [Transformers.js](https://huggingface.co/docs/transformers.js) 的 Kokoro TTS JavaScript 运行时封装。当前版本面向 v1.1 中文模型接线,内置中文音素化流程,支持普通话以及中英混合文本。
13
+
14
+ ## 目标
15
+
16
+ - 为 `kokoro.js` 增加浏览器本地的普通话和中英混合 TTS 支持。
17
+ - 以 Python `kokoro` 和 `misaki` v1.1 中文前端作为中文音素化、文本归一化、变调和儿化行为参考。
18
+ - 保持 JavaScript 包运行时自包含且适合浏览器使用;Python 只作为行为参考,不作为运行时依赖。
19
+ - 面向 v1.1 中文 ONNX 模型、本地 voice 资产和 WebGPU 优先路径,提供可实际部署的 Web 推理体验。
20
+ - 中英混合文本中的英文片段保留可读输出,走现有英文音素化路径,而不是丢弃或替换为未知符号。
21
+
22
+ ## 当前状态
23
+
24
+ - 中文音素化会直接输出 v1.1 中文 tokenizer 符号,不再回落到英文 espeak 音素化。
25
+ - 已覆盖常见普通话文本归一化场景,包括时间、日期、数字范围、分数、负数、百分比、电话号码、温度、度量衡、金额和量词数字。
26
+ - 已实现一批声调相关规则,包括常见轻声词和粒子、叠词轻声、`一`/`不` 变调、三声变调和儿化处理。
27
+ - Golden corpus 以 Python `misaki` v1.1 输出作为参考;当前计划记录 164 条样例,其中 143 条 match,21 条已知 gap。
28
+ - 剩余 gap 主要来自 `Intl.Segmenter` 与 `jieba.posseg` 的分词差异、浏览器路径缺少 POS 信息,以及长尾文本归一化规则尚未完全覆盖。
29
+
30
+ ## 特点
31
+
32
+ - 通过 Transformers.js 在浏览器和 Node.js 中运行 Kokoro TTS。
33
+ - 支持 v1.1 中文 ONNX 模型:`onnx-community/Kokoro-82M-v1.1-zh-ONNX`。
34
+ - 支持普通话文本和中英混合文本,中文会走显式中文音素化路径。
35
+ - 通过 `voicePath` 加载本地 voice 文件;npm 包不内置 voice `.bin` 文件。
36
+ - 支持 `tts.generate()` 一次性生成,也支持 `tts.stream()` 分段生成。
37
+
38
+ ## 待优化项
39
+
40
+ - 推理前需要应用自行托管或复制 voice `.bin` 文件。
41
+ - 当前注册的英文 voice 只有 `af_maple`、`af_sol` 和 `bf_vale`。
42
+ - 中文音素化由 JavaScript 实现,长尾多音字或文本归一化场景可能仍与 Python `misaki` 有差异。
43
+ - 浏览器性能依赖 WebGPU 可用性;只有在 `webgpu` 不可用时才建议回退到 `wasm`。
44
+ - 推荐使用 `fp32`、`fp16` 和 `q4f16`;`q8`、`q4` 虽然可由加载器支持,但不推荐用于该模型。
45
+
46
+ ## 安装
47
+
48
+ ```bash
49
+ npm i @uzen/kokoro-js
50
+ ```
51
+
52
+ 安装本包时会同时安装运行依赖,包括 `@huggingface/transformers`。Transformers.js 会进一步带入 ONNX Runtime 相关传递依赖,例如 `onnxruntime-web`;应用不需要单独安装这些包。
53
+
54
+ ## 模型和声音
55
+
56
+ 使用 v1.1 中文 ONNX 模型:
57
+
58
+ ```txt
59
+ onnx-community/Kokoro-82M-v1.1-zh-ONNX
60
+ ```
61
+
62
+ voice `.bin` 文件不会打包进 npm 包。请从模型仓库下载需要的 voice 文件,并通过 `voicePath` 暴露给运行时。
63
+
64
+ - 浏览器/Vite 应用:把 `zf_001.bin` 这类文件放到 `public/kokoro/voices/`;默认 `voicePath` 是 `/kokoro/voices`。
65
+ - Node.js:把 voice 文件放到本地目录,并把该目录作为 `voicePath` 传入,例如 `voicePath: "./voices"`。
66
+
67
+ 当前注册的 voice 包括 v1.1 中文 voice、两个美式英语 voice 和一个英式英语 voice:
68
+
69
+ ```txt
70
+ af_maple, af_sol, bf_vale
71
+ zf_001, zf_002, zf_003, zf_004, zf_005, zf_006, zf_007, zf_008,
72
+ zf_017, zf_018, zf_019, zf_021, zf_022, zf_023, zf_024, zf_026,
73
+ zf_027, zf_028, zf_032, zf_036, zf_038, zf_039, zf_040, zf_042,
74
+ zf_043, zf_044, zf_046, zf_047, zf_048, zf_049, zf_051, zf_059,
75
+ zf_060, zf_067, zf_070, zf_071, zf_072, zf_073, zf_074, zf_075,
76
+ zf_076, zf_077, zf_078, zf_079, zf_083, zf_084, zf_085, zf_086,
77
+ zf_087, zf_088, zf_090, zf_092, zf_093, zf_094, zf_099,
78
+ zm_009, zm_010, zm_011, zm_012, zm_013, zm_014, zm_015, zm_016,
79
+ zm_020, zm_025, zm_029, zm_030, zm_031, zm_033, zm_034, zm_035,
80
+ zm_037, zm_041, zm_045, zm_050, zm_052, zm_053, zm_054, zm_055,
81
+ zm_056, zm_057, zm_058, zm_061, zm_062, zm_063, zm_064, zm_065,
82
+ zm_066, zm_068, zm_069, zm_080, zm_081, zm_082, zm_089, zm_091,
83
+ zm_095, zm_096, zm_097, zm_098, zm_100
84
+ ```
85
+
86
+ 运行时请以 `tts.list_voices()` 或 `tts.voices` 返回的结果为准。
87
+
88
+ ## 生成语音
89
+
90
+ ```js
91
+ import { KokoroTTS } from "@uzen/kokoro-js";
92
+
93
+ const model_id = "onnx-community/Kokoro-82M-v1.1-zh-ONNX";
94
+ const tts = await KokoroTTS.from_pretrained(model_id, {
95
+ dtype: "fp32",
96
+ device: "webgpu",
97
+ voicePath: "/kokoro/voices",
98
+ });
99
+
100
+ const audio = await tts.generate("你好,欢迎使用 Kokoro 中文语音。", {
101
+ voice: "zf_001",
102
+ speed: 1,
103
+ });
104
+
105
+ audio.save("audio.wav");
106
+ ```
107
+
108
+ 默认推荐使用 `"fp32"` 作为 `dtype`,浏览器默认推荐使用 `"webgpu"` 作为 `device`。推荐的 `dtype` 是 `"fp32"`、`"fp16"` 和 `"q4f16"`;`"q8"`、`"q4"` 等量化版本虽然可由加载器支持,但不推荐用于该模型。支持的 `device` 包括 `"webgpu"`、`"wasm"`、`"cpu"` 或 `null`;浏览器优先使用 `"webgpu"`,不可用时回退到 `"wasm"`,Node.js 使用 `"cpu"`。
109
+
110
+ ## 流式生成
111
+
112
+ ```js
113
+ import { KokoroTTS, TextSplitterStream } from "@uzen/kokoro-js";
114
+
115
+ const model_id = "onnx-community/Kokoro-82M-v1.1-zh-ONNX";
116
+ const tts = await KokoroTTS.from_pretrained(model_id, {
117
+ dtype: "fp32",
118
+ device: "webgpu",
119
+ });
120
+
121
+ const splitter = new TextSplitterStream();
122
+ const stream = tts.stream(splitter, { voice: "zf_001" });
123
+
124
+ (async () => {
125
+ let index = 0;
126
+ for await (const { text, phonemes, audio } of stream) {
127
+ console.log({ text, phonemes });
128
+ audio.save(`audio-${index++}.wav`);
129
+ }
130
+ })();
131
+
132
+ splitter.push("Kokoro 支持中文和中英 mixed text。它可以分段生成音频。");
133
+ splitter.close();
134
+ ```
135
+
136
+ `tts.stream()` 也可以直接接收字符串。默认会使用 `TextSplitterStream` 进行句子切分,并对过长片段按标点继续切分。
137
+
138
+ ## API 说明
139
+
140
+ - `KokoroTTS.from_pretrained(model_id, options)` 通过 Transformers.js 加载模型和 tokenizer。
141
+ - `tts.generate(text, { voice, speed })` 返回 Transformers.js 的 `RawAudio` 对象。
142
+ - `tts.stream(textOrSplitter, options)` 为每个生成片段产出 `{ text, phonemes, audio }`。
143
+ - `voicePath` 是基础路径,不是完整文件路径;浏览器会加载 `${voicePath}/${voice}.bin`,Node.js 会按同样模式从本地文件系统解析。
144
+ - `zf_` 和 `zm_` 开头的中文 voice 使用中文音素化路径;`af_` 和 `bf_` 开头的英文 voice 使用英文音素化路径。
package/README.md CHANGED
@@ -4,163 +4,141 @@
4
4
  <a href="https://www.npmjs.com/package/@uzen/kokoro-js"><img alt="NPM" src="https://img.shields.io/npm/v/@uzen%2Fkokoro-js"></a>
5
5
  <a href="https://www.npmjs.com/package/@uzen/kokoro-js"><img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@uzen%2Fkokoro-js"></a>
6
6
  <a href="https://www.jsdelivr.com/package/npm/@uzen/kokoro-js"><img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@uzen%2Fkokoro-js"></a>
7
- <a href="https://github.com/hexgrad/kokoro/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/github/license/hexgrad/kokoro?color=blue"></a>
8
- <a href="https://huggingface.co/spaces/webml-community/kokoro-webgpu"><img alt="Demo" src="https://img.shields.io/badge/Hugging_Face-demo-green"></a>
7
+ <a href="https://www.npmjs.com/package/@uzen/kokoro-js"><img alt="License" src="https://img.shields.io/npm/l/@uzen%2Fkokoro-js?color=blue"></a>
9
8
  </p>
10
9
 
11
- Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out). This JavaScript library allows the model to be run 100% locally in the browser thanks to [🤗 Transformers.js](https://huggingface.co/docs/transformers.js). Try it out using our [online demo](https://huggingface.co/spaces/webml-community/kokoro-webgpu)!
10
+ Language: English | [简体中文](./README-zh.md)
12
11
 
13
- ## Usage
12
+ `@uzen/kokoro-js` is a JavaScript runtime wrapper for Kokoro TTS based on [Transformers.js](https://huggingface.co/docs/transformers.js). This fork is wired for the v1.1 Chinese model and supports Mandarin plus mixed Chinese/English text through the built-in phonemizer.
14
13
 
15
- First, install the `@uzen/kokoro-js` library from [NPM](https://npmjs.com/package/@uzen/kokoro-js) using:
14
+ ## Goals
15
+
16
+ - Add browser-local Mandarin and mixed Chinese/English TTS support to `kokoro.js`.
17
+ - Use Python `kokoro` plus `misaki` v1.1 Chinese frontend behavior as the reference for Chinese phonemization, normalization, tone sandhi, and erhua handling.
18
+ - Keep the JavaScript package self-contained and browser-friendly at runtime; Python is used only as a behavior reference, not as a runtime dependency.
19
+ - Target practical web inference with the v1.1 Chinese ONNX model, local voice assets, and a WebGPU-first default path.
20
+ - Preserve usable English spans in mixed text by routing them through the existing English phonemizer path instead of dropping them.
21
+
22
+ ## Current Status
23
+
24
+ - Chinese phonemization now emits v1.1 Chinese tokenizer symbols directly instead of falling back to English espeak phonemization.
25
+ - The implementation covers common Mandarin normalization cases including time expressions, dates, numeric ranges, fractions, negative numbers, percentages, phone numbers, temperatures, measurements, money, and quantified numbers.
26
+ - Tone-related handling includes selected neutral-tone words and particles, reduplication, `一`/`不` sandhi, third-tone sandhi, and erhua handling.
27
+ - Golden corpus tracking is based on Python `misaki` v1.1 outputs; the current plan records 164 examples with 143 matches and 21 known gaps.
28
+ - Remaining gaps mainly come from `Intl.Segmenter` versus `jieba.posseg` segmentation differences, missing POS information in the browser path, and long-tail text normalization cases.
29
+
30
+ ## Features
31
+
32
+ - Runs Kokoro TTS in browser and Node.js through Transformers.js.
33
+ - Supports the v1.1 Chinese ONNX model: `onnx-community/Kokoro-82M-v1.1-zh-ONNX`.
34
+ - Supports Mandarin text and mixed Chinese/English text with explicit Chinese phonemization.
35
+ - Provides local voice loading through `voicePath`; voice `.bin` files are not bundled in the npm package.
36
+ - Supports single-shot generation through `tts.generate()` and chunked generation through `tts.stream()`.
37
+
38
+ ## Known Limitations
39
+
40
+ - Voice assets must be hosted or copied by the application before inference can run.
41
+ - The registered English voice set is limited to `af_maple`, `af_sol`, and `bf_vale`.
42
+ - Chinese phonemization is implemented in JavaScript and may still differ from Python `misaki` on long-tail polyphones or text normalization cases.
43
+ - Browser performance depends on WebGPU availability; use `wasm` only as a fallback when `webgpu` is unavailable.
44
+ - `fp32`, `fp16`, and `q4f16` are recommended; `q8` and `q4` are supported by the loader but not recommended for this model.
45
+
46
+ ## Install
16
47
 
17
48
  ```bash
18
49
  npm i @uzen/kokoro-js
19
50
  ```
20
51
 
21
- Voice `.bin` files are not bundled in the npm package. Download the voices you need from the `voices/` directory of [`onnx-community/Kokoro-82M-v1.1-zh-ONNX`](https://huggingface.co/onnx-community/Kokoro-82M-v1.1-zh-ONNX/tree/main/voices), then make them available to the runtime:
52
+ Installing this package also installs its runtime dependencies, including `@huggingface/transformers`. Transformers.js brings in ONNX Runtime packages such as `onnxruntime-web` as transitive dependencies; applications do not need to install them separately.
22
53
 
23
- - Browser/Vite apps: place files such as `zf_001.bin` under `public/kokoro/voices/`; the default `voicePath` is `/kokoro/voices`.
24
- - Node.js: place the files in a local directory and pass `voicePath`, for example `voicePath: "./voices"`.
54
+ ## Model And Voices
25
55
 
26
- You can then generate speech as follows:
56
+ Use the v1.1 Chinese ONNX model:
57
+
58
+ ```txt
59
+ onnx-community/Kokoro-82M-v1.1-zh-ONNX
60
+ ```
61
+
62
+ Voice `.bin` files are not bundled in the npm package. Download the voice files you need from the model repository and expose them through `voicePath`.
63
+
64
+ - Browser/Vite apps: put files such as `zf_001.bin` under `public/kokoro/voices/`; the default `voicePath` is `/kokoro/voices`.
65
+ - Node.js: put the files in a local directory and pass that directory as `voicePath`, for example `voicePath: "./voices"`.
66
+
67
+ Current registered voices are the v1.1 Chinese voices plus two American English voices and one British English voice:
68
+
69
+ ```txt
70
+ af_maple, af_sol, bf_vale
71
+ zf_001, zf_002, zf_003, zf_004, zf_005, zf_006, zf_007, zf_008,
72
+ zf_017, zf_018, zf_019, zf_021, zf_022, zf_023, zf_024, zf_026,
73
+ zf_027, zf_028, zf_032, zf_036, zf_038, zf_039, zf_040, zf_042,
74
+ zf_043, zf_044, zf_046, zf_047, zf_048, zf_049, zf_051, zf_059,
75
+ zf_060, zf_067, zf_070, zf_071, zf_072, zf_073, zf_074, zf_075,
76
+ zf_076, zf_077, zf_078, zf_079, zf_083, zf_084, zf_085, zf_086,
77
+ zf_087, zf_088, zf_090, zf_092, zf_093, zf_094, zf_099,
78
+ zm_009, zm_010, zm_011, zm_012, zm_013, zm_014, zm_015, zm_016,
79
+ zm_020, zm_025, zm_029, zm_030, zm_031, zm_033, zm_034, zm_035,
80
+ zm_037, zm_041, zm_045, zm_050, zm_052, zm_053, zm_054, zm_055,
81
+ zm_056, zm_057, zm_058, zm_061, zm_062, zm_063, zm_064, zm_065,
82
+ zm_066, zm_068, zm_069, zm_080, zm_081, zm_082, zm_089, zm_091,
83
+ zm_095, zm_096, zm_097, zm_098, zm_100
84
+ ```
85
+
86
+ Call `tts.list_voices()` or read `tts.voices` at runtime for the authoritative list from the installed package.
87
+
88
+ ## Generate Speech
27
89
 
28
90
  ```js
29
91
  import { KokoroTTS } from "@uzen/kokoro-js";
30
92
 
31
93
  const model_id = "onnx-community/Kokoro-82M-v1.1-zh-ONNX";
32
94
  const tts = await KokoroTTS.from_pretrained(model_id, {
33
- dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
34
- device: "wasm", // Options: "wasm", "webgpu" (web) or "cpu" (node). If using "webgpu", we recommend using dtype="fp32".
35
- // voicePath: "./voices", // Node.js example. Browser default is "/kokoro/voices".
95
+ dtype: "fp32",
96
+ device: "webgpu",
97
+ voicePath: "/kokoro/voices",
36
98
  });
37
99
 
38
- const text = "你好,欢迎使用 Kokoro 中文语音。";
39
- const audio = await tts.generate(text, {
40
- // Use `tts.list_voices()` to list all available voices
100
+ const audio = await tts.generate("你好,欢迎使用 Kokoro 中文语音。", {
41
101
  voice: "zf_001",
102
+ speed: 1,
42
103
  });
104
+
43
105
  audio.save("audio.wav");
44
106
  ```
45
107
 
46
- Or if you'd prefer to stream the output, you can do that with:
108
+ Use `"fp32"` as the default `dtype` and `"webgpu"` as the default browser `device`. Recommended `dtype` values are `"fp32"`, `"fp16"`, and `"q4f16"`. Other supported quantized variants such as `"q8"` and `"q4"` are not recommended for this model. Supported `device` values are `"webgpu"`, `"wasm"`, `"cpu"`, or `null`; browser apps should prefer `"webgpu"` when available and can fall back to `"wasm"`, while Node.js uses `"cpu"`.
109
+
110
+ ## Stream Speech
47
111
 
48
112
  ```js
49
113
  import { KokoroTTS, TextSplitterStream } from "@uzen/kokoro-js";
50
114
 
51
115
  const model_id = "onnx-community/Kokoro-82M-v1.1-zh-ONNX";
52
116
  const tts = await KokoroTTS.from_pretrained(model_id, {
53
- dtype: "fp32", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
54
- // device: "webgpu", // Options: "wasm", "webgpu" (web) or "cpu" (node).
117
+ dtype: "fp32",
118
+ device: "webgpu",
55
119
  });
56
120
 
57
- // First, set up the stream
58
121
  const splitter = new TextSplitterStream();
59
122
  const stream = tts.stream(splitter, { voice: "zf_001" });
123
+
60
124
  (async () => {
61
- let i = 0;
125
+ let index = 0;
62
126
  for await (const { text, phonemes, audio } of stream) {
63
127
  console.log({ text, phonemes });
64
- audio.save(`audio-${i++}.wav`);
128
+ audio.save(`audio-${index++}.wav`);
65
129
  }
66
130
  })();
67
131
 
68
- // Next, add text to the stream. Note that the text can be added at different times.
69
- // For this example, let's pretend we're consuming text from an LLM, one word at a time.
70
- const text = "Kokoro 是一个轻量级本地语音模型,支持中文和中英混合文本。它可以在浏览器里通过 Transformers.js 运行,也可以在 Node.js 中生成音频文件。";
71
- const tokens = text.match(/\s*\S+/g);
72
- for (const token of tokens) {
73
- splitter.push(token);
74
- await new Promise((resolve) => setTimeout(resolve, 10));
75
- }
76
-
77
- // Finally, close the stream to signal that no more text will be added.
132
+ splitter.push("Kokoro 支持中文和中英 mixed text。它可以分段生成音频。");
78
133
  splitter.close();
79
-
80
- // Alternatively, if you'd like to keep the stream open, but flush any remaining text, you can use the `flush` method.
81
- // splitter.flush();
82
134
  ```
83
135
 
84
- ## Voices/Samples
85
-
86
- > [!TIP]
87
- > You can find samples for each of the voices in the [model card](https://huggingface.co/onnx-community/Kokoro-82M-v1.1-zh-ONNX#samples) on Hugging Face.
88
-
89
- ### American English
90
-
91
- | Name | Traits | Target Quality | Training Duration | Overall Grade |
92
- | ------------ | ------ | -------------- | ----------------- | ------------- |
93
- | **af_heart** | 🚺❤️ | | | **A** |
94
- | af_alloy | 🚺 | B | MM minutes | C |
95
- | af_aoede | 🚺 | B | H hours | C+ |
96
- | af_bella | 🚺🔥 | **A** | **HH hours** | **A-** |
97
- | af_jessica | 🚺 | C | MM minutes | D |
98
- | af_kore | 🚺 | B | H hours | C+ |
99
- | af_nicole | 🚺🎧 | B | **HH hours** | B- |
100
- | af_nova | 🚺 | B | MM minutes | C |
101
- | af_river | 🚺 | C | MM minutes | D |
102
- | af_sarah | 🚺 | B | H hours | C+ |
103
- | af_sky | 🚺 | B | _M minutes_ 🤏 | C- |
104
- | am_adam | 🚹 | D | H hours | F+ |
105
- | am_echo | 🚹 | C | MM minutes | D |
106
- | am_eric | 🚹 | C | MM minutes | D |
107
- | am_fenrir | 🚹 | B | H hours | C+ |
108
- | am_liam | 🚹 | C | MM minutes | D |
109
- | am_michael | 🚹 | B | H hours | C+ |
110
- | am_onyx | 🚹 | C | MM minutes | D |
111
- | am_puck | 🚹 | B | H hours | C+ |
112
- | am_santa | 🚹 | C | _M minutes_ 🤏 | D- |
113
-
114
- ### Chinese (Mandarin) — v1.1 zh voices
115
-
116
- | Name | Gender | Name | Gender | Name | Gender |
117
- | -------- | ------ | ------- | ------ | ------- | ------ |
118
- | zf_001 | 🚺 | zf_046 | 🚺 | zm_052 | 🚹 |
119
- | zf_002 | 🚺 | zf_047 | 🚺 | zm_053 | 🚹 |
120
- | zf_003 | 🚺 | zf_048 | 🚺 | zm_054 | 🚹 |
121
- | zf_004 | 🚺 | zf_049 | 🚺 | zm_055 | 🚹 |
122
- | zf_005 | 🚺 | zf_051 | 🚺 | zm_056 | 🚹 |
123
- | zf_006 | 🚺 | zf_059 | 🚺 | zm_057 | 🚹 |
124
- | zf_007 | 🚺 | zf_060 | 🚺 | zm_058 | 🚹 |
125
- | zf_008 | 🚺 | zf_067 | 🚺 | zm_061 | 🚹 |
126
- | zf_017 | 🚺 | zf_070 | 🚺 | zm_062 | 🚹 |
127
- | zf_018 | 🚺 | zf_071 | 🚺 | zm_063 | 🚹 |
128
- | zf_019 | 🚺 | zf_072 | 🚺 | zm_064 | 🚹 |
129
- | zf_021 | 🚺 | zf_073 | 🚺 | zm_065 | 🚹 |
130
- | zf_022 | 🚺 | zf_074 | 🚺 | zm_066 | 🚹 |
131
- | zf_023 | 🚺 | zf_075 | 🚺 | zm_068 | 🚹 |
132
- | zf_024 | 🚺 | zf_076 | 🚺 | zm_069 | 🚹 |
133
- | zf_026 | 🚺 | zf_077 | 🚺 | zm_080 | 🚹 |
134
- | zf_027 | 🚺 | zf_078 | 🚺 | zm_081 | 🚹 |
135
- | zf_028 | 🚺 | zf_079 | 🚺 | zm_082 | 🚹 |
136
- | zf_032 | 🚺 | zf_083 | 🚺 | zm_089 | 🚹 |
137
- | zf_036 | 🚺 | zf_084 | 🚺 | zm_091 | 🚹 |
138
- | zf_038 | 🚺 | zf_085 | 🚺 | zm_095 | 🚹 |
139
- | zf_039 | 🚺 | zf_086 | 🚺 | zm_096 | 🚹 |
140
- | zf_040 | 🚺 | zf_087 | 🚺 | zm_097 | 🚹 |
141
- | zf_042 | 🚺 | zf_088 | 🚺 | zm_098 | 🚹 |
142
- | zf_043 | 🚺 | zf_090 | 🚺 | zm_100 | 🚹 |
143
- | zf_044 | 🚺 | zf_092 | 🚺 | zm_009 | 🚹 |
144
- | zf_092 | 🚺 | zf_093 | 🚺 | zm_010 | 🚹 |
145
- | zf_094 | 🚺 | zf_099 | 🚺 | zm_011 | 🚹 |
146
- | zm_012 | 🚹 | zm_020 | 🚹 | zm_029 | 🚹 |
147
- | zm_013 | 🚹 | zm_025 | 🚹 | zm_030 | 🚹 |
148
- | zm_014 | 🚹 | zm_031 | 🚹 | zm_033 | 🚹 |
149
- | zm_015 | 🚹 | zm_034 | 🚹 | zm_035 | 🚹 |
150
- | zm_016 | 🚹 | zm_037 | 🚹 | zm_041 | 🚹 |
151
- | zm_045 | 🚹 | zm_050 | 🚹 | | |
152
-
153
- Use with v1.1 zh model: `onnx-community/Kokoro-82M-v1.1-zh-ONNX`.
154
-
155
- ### British English
156
-
157
- | Name | Traits | Target Quality | Training Duration | Overall Grade |
158
- | ----------- | ------ | -------------- | ----------------- | ------------- |
159
- | bf_alice | 🚺 | C | MM minutes | D |
160
- | bf_emma | 🚺 | B | **HH hours** | B- |
161
- | bf_isabella | 🚺 | B | MM minutes | C |
162
- | bf_lily | 🚺 | C | MM minutes | D |
163
- | bm_daniel | 🚹 | C | MM minutes | D |
164
- | bm_fable | 🚹 | B | MM minutes | C |
165
- | bm_george | 🚹 | B | MM minutes | C |
166
- | bm_lewis | 🚹 | C | H hours | D+ |
136
+ `tts.stream()` also accepts a string directly. By default it uses `TextSplitterStream` sentence splitting and further splits long chunks at punctuation boundaries.
137
+
138
+ ## API Notes
139
+
140
+ - `KokoroTTS.from_pretrained(model_id, options)` loads the model and tokenizer through Transformers.js.
141
+ - `tts.generate(text, { voice, speed })` returns a Transformers.js `RawAudio` object.
142
+ - `tts.stream(textOrSplitter, options)` yields `{ text, phonemes, audio }` for each generated chunk.
143
+ - `voicePath` is a base path, not a full file path; the runtime loads `${voicePath}/${voice}.bin` in browsers and resolves the same pattern from the local filesystem in Node.js.
144
+ - Chinese voices begin with `zf_` or `zm_` and use the Chinese phonemizer path. English voices beginning with `af_` or `bf_` use the English phonemizer path.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@uzen/kokoro-js",
3
- "version": "1.2.3",
3
+ "version": "1.2.4",
4
4
  "type": "module",
5
5
  "exports": {
6
6
  "types": "./types/kokoro.d.ts",
@@ -11,7 +11,7 @@
11
11
  "default": "./dist/kokoro.js"
12
12
  },
13
13
  "scripts": {
14
- "build": "node -e \"fs.rmSync('dist',{recursive:true,force:true});fs.rmSync('types',{recursive:true,force:true})\" && rollup -c && tsc && node -e \"fs.copyFileSync('../LICENSE','LICENSE')\"",
14
+ "build": "node -e \"fs.rmSync('dist',{recursive:true,force:true});fs.rmSync('types',{recursive:true,force:true})\" && rollup -c && tsc",
15
15
  "format": "prettier --write . --print-width 1000",
16
16
  "test": "vitest run"
17
17
  },
@@ -30,13 +30,20 @@
30
30
  },
31
31
  "contributors": [
32
32
  "Xenova",
33
- "uzen.zone"
33
+ "uzen-zone"
34
34
  ],
35
35
  "license": "Apache-2.0",
36
36
  "description": "High-quality text-to-speech for the web",
37
+ "homepage": "https://github.com/uzen-zone/kokoro-js#readme",
38
+ "repository": {
39
+ "type": "git",
40
+ "url": "git+https://github.com/uzen-zone/kokoro-js.git"
41
+ },
42
+ "bugs": {
43
+ "url": "https://github.com/uzen-zone/kokoro-js/issues"
44
+ },
37
45
  "dependencies": {
38
46
  "@huggingface/transformers": "^4.2.0",
39
- "phonemize": "^1.2.0",
40
47
  "phonemizer": "^1.2.1",
41
48
  "pinyin-pro": "^3.28.1"
42
49
  },
@@ -52,6 +59,7 @@
52
59
  "types",
53
60
  "dist",
54
61
  "README.md",
62
+ "README-zh.md",
55
63
  "LICENSE"
56
64
  ],
57
65
  "publishConfig": {