opencc-wasm 0.2.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,74 +1,330 @@
1
1
  # opencc-wasm
2
2
 
3
- This package provides a WebAssembly backend for OpenCC, fully compatible with the `opencc-js` public API. It bundles the OpenCC C++ core (plus marisa) compiled via Emscripten, plus the official OpenCC configs and prebuilt `.ocd2` dictionaries (placed under `dist/data/` at build time).
4
- License: Apache-2.0 (see LICENSE).
3
+ [![npm version](https://img.shields.io/npm/v/opencc-wasm.svg)](https://www.npmjs.com/package/opencc-wasm)
4
+ [![CDN](https://img.shields.io/badge/CDN-jsDelivr-orange.svg)](https://cdn.jsdelivr.net/npm/opencc-wasm@latest/dist/esm/index.js)
5
+ [![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE)
5
6
 
6
- ## Features
7
- - Same API surface as `opencc-js`: `OpenCC.Converter`, `CustomConverter`, `ConverterFactory`, and locale presets.
8
- - No native bindings required; runs in Node.js and modern browsers (ESM), with a CommonJS build for legacy `require`.
9
- - On-demand loading of configs and dictionaries from the package’s `data/` directory into the Emscripten FS; each config/dict is cached after first use.
7
+ [繁體中文](README.zh.md)
8
+
9
+ > 🚀 **Out-of-the-box Chinese text conversion library** - 3 lines of code, auto-loads configs and dictionaries from CDN!
10
+
11
+ WebAssembly port of OpenCC (Open Chinese Convert) with full API compatibility. Bundles the official OpenCC C++ core compiled via Emscripten, plus all official configs and prebuilt `.ocd2` dictionaries.
12
+
13
+ **License:** Apache-2.0
14
+
15
+ ## ✨ Features
16
+
17
+ - 🎯 **Zero Configuration** - Auto-loads all configs and dictionaries from CDN
18
+ - 🔥 **3 Lines to Start** - Simplest API, just import and use
19
+ - 🌐 **CDN Ready** - Use directly from jsDelivr/unpkg without bundler
20
+ - 📦 **All-in-One** - Includes all 14+ official conversion types
21
+ - ⚡ **Auto Caching** - Resources cached after first load
22
+ - 🔧 **Full Compatibility** - Compatible with `opencc-js` API
23
+ - 🚫 **No Native Bindings** - Pure WASM, cross-platform
24
+ - 💻 **Universal** - Works in Node.js, browsers, Deno, etc.
25
+
26
+ ## 🚀 Quick Start
27
+
28
+ ### Browser (CDN - Zero Install!)
29
+
30
+ ```html
31
+ <script type="module">
32
+ // 1. Import from CDN
33
+ import OpenCC from "https://cdn.jsdelivr.net/npm/opencc-wasm@0.4.0/dist/esm/index.js";
34
+
35
+ // 2. Create converter (auto-downloads everything!)
36
+ const converter = OpenCC.Converter({ from: "cn", to: "tw" });
37
+
38
+ // 3. Convert - Done!
39
+ const result = await converter("简体中文");
40
+ console.log(result); // 簡體中文
41
+ </script>
42
+ ```
43
+
44
+ **That's it!** All configs and dictionaries are automatically downloaded from CDN.
45
+
46
+ ### CDN (Converter API)
47
+
48
+ ```javascript
49
+ import OpenCC from "https://cdn.jsdelivr.net/npm/opencc-wasm@0.4.0/dist/esm/index.js";
50
+
51
+ const converter = OpenCC.Converter({ from: "cn", to: "t" });
52
+ const result = await converter("简体中文");
53
+ console.log(result); // 簡體中文
54
+ ```
55
+
56
+ Example source: `test/cdn-simple.mjs`
57
+
58
+ ### Node.js (NPM)
10
59
 
11
- ## Installation
12
60
  ```bash
13
61
  npm install opencc-wasm
14
62
  ```
15
63
 
16
- ## Usage
17
- ```js
64
+ ```javascript
18
65
  import OpenCC from "opencc-wasm";
19
66
 
20
- // Convert Traditional Chinese (HK) to Simplified (CN)
21
- const converter = OpenCC.Converter({ from: "hk", to: "cn" });
22
- console.log(await converter("漢語")); // => 汉语
67
+ const converter = OpenCC.Converter({ from: "cn", to: "tw" });
68
+ const result = await converter("简体中文");
69
+ console.log(result); // 簡體中文
70
+ ```
71
+
72
+ ## 📖 API Reference
73
+
74
+ ### OpenCC.Converter() - Create Converter
75
+
76
+ Two ways to specify conversions:
77
+
78
+ #### Method 1: Using `config` parameter (Recommended)
79
+
80
+ Directly specify OpenCC config file name:
81
+
82
+ ```javascript
83
+ // Simplified → Traditional (Taiwan phrases)
84
+ const converter = OpenCC.Converter({ config: "s2twp" });
85
+ const result = await converter("服务器软件"); // 伺服器軟體
86
+ ```
87
+
88
+ **Supported configs:**
89
+
90
+ | Config | Description | Example |
91
+ |--------|-------------|---------|
92
+ | `s2t` | Simplified → Traditional | 简体 → 簡體 |
93
+ | `s2tw` | Simplified → Taiwan | 软件 → 軟件 |
94
+ | `s2twp` | Simplified → Taiwan (phrases) | 软件 → 軟體 |
95
+ | `s2hk` | Simplified → Hong Kong | 打印机 → 打印機 |
96
+ | `t2s` | Traditional → Simplified | 繁體 → 繁体 |
97
+ | `t2tw` | Traditional → Taiwan | 台灣 → 臺灣 |
98
+ | `t2hk` | Traditional → Hong Kong | 香港 → 香港 |
99
+ | `t2jp` | Traditional → Japanese Shinjitai | 繁體 → 繁体 |
100
+ | `tw2s` | Taiwan → Simplified | 軟體 → 软件 |
101
+ | `tw2sp` | Taiwan → Simplified (phrases) | 滑鼠 → 鼠标 |
102
+ | `tw2t` | Taiwan → Traditional | 臺灣 → 台灣 |
103
+ | `hk2s` | Hong Kong → Simplified | 打印機 → 打印机 |
104
+ | `hk2t` | Hong Kong → Traditional | 香港 → 香港 |
105
+ | `jp2t` | Japanese Shinjitai → Traditional | 繁体 → 繁體 |
106
+ | `t2cngov` | Traditional → CN Gov Standard | 潮溼 → 潮湿 |
107
+ | `t2cngov_keep_simp` | Traditional → CN Gov (Keep Simp) | 简体繁體 → 简体繁體 |
23
108
 
24
- // Custom dictionary
25
- const custom = OpenCC.CustomConverter([
26
- ["“", "「"],
27
- ["”", "」"],
28
- ["‘", "『"],
29
- ["", ""],
109
+ #### Method 2: Using `from`/`to` parameters (Legacy)
110
+
111
+ Specify source and target locales:
112
+
113
+ ```javascript
114
+ const converter = OpenCC.Converter({ from: "cn", to: "twp" });
115
+ const result = await converter("服务器"); // 伺服器
116
+ ```
117
+
118
+ **Locale codes:**
119
+
120
+ | Code | Description |
121
+ |------|-------------|
122
+ | `cn` | Simplified Chinese (Mainland) |
123
+ | `tw` | Traditional Chinese (Taiwan) |
124
+ | `twp` | Taiwan with phrases |
125
+ | `hk` | Traditional Chinese (Hong Kong) |
126
+ | `t` | Traditional Chinese (general) |
127
+ | `s` | Simplified Chinese (alias) |
128
+ | `sp` | Simplified with phrases |
129
+ | `jp` | Japanese Shinjitai |
130
+
131
+ **Both methods work identically!** Choose what you prefer.
132
+
133
+ ### OpenCC.ConverterFactory() - With Custom Dictionary
134
+
135
+ ```javascript
136
+ const converter = OpenCC.ConverterFactory(
137
+ "cn", // from
138
+ "tw", // to
139
+ [ // custom dictionaries
140
+ [["服务器", "伺服器"], ["文件", "檔案"]],
141
+ "網路 网络 | 檔案 文件"
142
+ ]
143
+ );
144
+
145
+ const result = await converter("服务器上的文件通过网络传输");
146
+ // Output: 伺服器上的檔案通過網路傳輸
147
+ ```
148
+
149
+ ### OpenCC.CustomConverter() - Pure Custom Converter
150
+
151
+ ```javascript
152
+ const converter = OpenCC.CustomConverter([
153
+ [""", "「"],
154
+ [""", "」"],
155
+ ["'", "『"],
156
+ ["'", "』"],
30
157
  ]);
31
- console.log(custom("悟空道:“师父又来了。怎么叫做‘水中捞月’?”"));
32
- // => 悟空道:「師父又來了。怎麼叫做『水中撈月』?」
158
+
159
+ const result = converter("这是"引号"和'单引号'");
160
+ // Output: 这是「引号」和『单引号』
161
+ ```
162
+
163
+ ## 💡 Usage Examples
164
+
165
+ ### React
166
+
167
+ ```jsx
168
+ import { useState } from 'react';
169
+ import OpenCC from 'opencc-wasm';
170
+
171
+ function App() {
172
+ const [output, setOutput] = useState('');
173
+
174
+ const handleConvert = async () => {
175
+ const converter = OpenCC.Converter({ config: "s2tw" });
176
+ setOutput(await converter("简体中文"));
177
+ };
178
+
179
+ return (
180
+ <div>
181
+ <button onClick={handleConvert}>Convert</button>
182
+ <div>{output}</div>
183
+ </div>
184
+ );
185
+ }
186
+ ```
187
+
188
+ ### Vue 3
189
+
190
+ ```vue
191
+ <script setup>
192
+ import { ref } from 'vue';
193
+ import OpenCC from 'opencc-wasm';
194
+
195
+ const output = ref('');
196
+
197
+ async function handleConvert() {
198
+ const converter = OpenCC.Converter({ config: "s2tw" });
199
+ output.value = await converter("简体中文");
200
+ }
201
+ </script>
202
+
203
+ <template>
204
+ <button @click="handleConvert">Convert</button>
205
+ <div>{{ output }}</div>
206
+ </template>
33
207
  ```
34
208
 
35
- ### Node (CommonJS)
36
- ```js
37
- const OpenCC = require("opencc-wasm").default;
209
+ ### Node.js CLI
210
+
211
+ ```javascript
212
+ #!/usr/bin/env node
213
+ import OpenCC from 'opencc-wasm';
214
+
215
+ const text = process.argv[2] || "简体中文";
216
+ const converter = OpenCC.Converter({ config: "s2tw" });
217
+ console.log(await converter(text));
38
218
  ```
39
219
 
40
- ## Build
220
+ ### Web Worker
41
221
 
42
- The project uses a two-stage build process with semantic separation:
222
+ ```javascript
223
+ // worker.js
224
+ import OpenCC from 'opencc-wasm';
43
225
 
44
- ### Stage 1: Build WASM (intermediate artifacts)
226
+ let converters = {};
227
+
228
+ self.onmessage = async (e) => {
229
+ const { config, text } = e.data;
230
+
231
+ if (!converters[config]) {
232
+ converters[config] = OpenCC.Converter({ config });
233
+ }
234
+
235
+ const result = await converters[config](text);
236
+ self.postMessage(result);
237
+ };
238
+ ```
239
+
240
+ ```javascript
241
+ // main.js
242
+ const worker = new Worker('worker.js', { type: 'module' });
243
+
244
+ worker.onmessage = (e) => {
245
+ console.log('Result:', e.data);
246
+ };
247
+
248
+ worker.postMessage({ config: 's2tw', text: '简体中文' });
249
+ ```
250
+
251
+ ## 🔧 Best Practices
252
+
253
+ ### ✅ Reuse Converter Instances
254
+
255
+ ```javascript
256
+ // ✅ Good: Create once, use many times
257
+ const converter = OpenCC.Converter({ config: "s2tw" });
258
+
259
+ for (const text of manyTexts) {
260
+ await converter(text); // Fast!
261
+ }
262
+ ```
263
+
264
+ ```javascript
265
+ // ❌ Avoid: Creating new instances every time
266
+ for (const text of manyTexts) {
267
+ const converter = OpenCC.Converter({ config: "s2tw" }); // Slow!
268
+ await converter(text);
269
+ }
270
+ ```
271
+
272
+ ### Multiple Converters (Auto-cached)
273
+
274
+ ```javascript
275
+ // Create multiple converters (resources auto-cached)
276
+ const s2t = OpenCC.Converter({ config: "s2t" });
277
+ const s2tw = OpenCC.Converter({ config: "s2tw" });
278
+ const t2s = OpenCC.Converter({ config: "t2s" });
279
+
280
+ // Use independently
281
+ console.log(await s2t("简体")); // 簡體
282
+ console.log(await s2tw("软件")); // 軟體
283
+ console.log(await t2s("繁體")); // 繁体
284
+ ```
285
+
286
+ ### TypeScript
287
+
288
+ ```typescript
289
+ import OpenCC from 'opencc-wasm';
290
+
291
+ type ConfigName = 's2t' | 's2tw' | 's2twp' | 't2s';
292
+
293
+ async function convert(config: ConfigName, text: string): Promise<string> {
294
+ const converter = OpenCC.Converter({ config });
295
+ return await converter(text);
296
+ }
297
+
298
+ const result = await convert('s2tw', '简体中文');
299
+ ```
300
+
301
+ ## 🏗️ Build
302
+
303
+ The project uses a two-stage build process:
304
+
305
+ ### Stage 1: Build WASM
45
306
 
46
307
  ```bash
47
308
  ./build.sh
48
309
  ```
49
310
 
50
- Compiles OpenCC + marisa-trie to WASM and generates intermediate build artifacts in `build/`:
51
- - `build/opencc-wasm.esm.js` - ESM WASM glue (for tests/development)
52
- - `build/opencc-wasm.cjs` - CJS WASM glue (for tests/development)
311
+ Compiles OpenCC + marisa-trie to WASM, outputs to `build/`:
312
+ - `build/opencc-wasm.esm.js` - ESM WASM glue
313
+ - `build/opencc-wasm.cjs` - CJS WASM glue
53
314
  - `build/opencc-wasm.wasm` - WASM binary
54
315
 
55
- **Semantic: `build/` = internal intermediate artifacts, not for publishing**
56
-
57
- ### Stage 2: Build API wrappers (publishable dist)
316
+ ### Stage 2: Build API
58
317
 
59
318
  ```bash
60
319
  node scripts/build-api.js
61
320
  ```
62
321
 
63
322
  Generates publishable distribution in `dist/`:
64
- - Copies WASM artifacts from `build/` to `dist/esm/` and `dist/cjs/`
65
- - Transforms source `index.js` to `dist/esm/index.js` with production paths
66
- - Generates `dist/cjs/index.cjs` with CJS-compatible wrapper
323
+ - Copies WASM files to `dist/esm/` and `dist/cjs/`
324
+ - Transforms source to production paths
67
325
  - Copies data files to `dist/data/`
68
326
 
69
- **Semantic: `dist/` = publishable artifacts for npm**
70
-
71
- ### Complete build
327
+ ### Complete Build
72
328
 
73
329
  ```bash
74
330
  npm run build
@@ -76,54 +332,101 @@ npm run build
76
332
 
77
333
  Runs both stages automatically.
78
334
 
79
- ## Testing
335
+ ## 🧪 Testing
336
+
80
337
  ```bash
81
338
  npm test
82
339
  ```
83
340
 
84
- Tests import from source `index.js`, which references `build/` artifacts.
85
- This ensures tests validate the actual build output, not stale dist files.
86
-
87
- Runs the upstream OpenCC testcases (converted to JSON) against the WASM build.
341
+ Runs the upstream OpenCC test cases against the WASM build.
88
342
 
89
- ## Project Structure
343
+ ## 📁 Project Structure
90
344
 
91
345
  ```
92
346
  wasm-lib/
93
- ├── build/ ← Intermediate WASM artifacts (gitignored, for tests)
94
- ├── opencc-wasm.esm.js
95
- │ ├── opencc-wasm.cjs
96
- │ └── opencc-wasm.wasm
97
- ├── dist/ ← Publishable distribution (committed to git)
347
+ ├── build/ ← Intermediate WASM artifacts (gitignored)
348
+ ├── dist/ ← Publishable distribution (committed)
98
349
  │ ├── esm/
99
350
  │ │ ├── index.js
100
- │ │ └── opencc-wasm.js
351
+ │ │ ├── opencc-wasm.js
352
+ │ │ └── opencc-wasm.wasm
101
353
  │ ├── cjs/
102
354
  │ │ ├── index.cjs
103
- │ │ └── opencc-wasm.cjs
104
- ├── opencc-wasm.wasm
105
- │ └── data/ ← OpenCC config + dict files
106
- ├── index.js ← Source API (references build/ for tests)
355
+ │ │ ├── opencc-wasm.cjs
356
+ │ └── opencc-wasm.wasm
357
+ │ └── data/ ← OpenCC configs + dicts
358
+ ├── index.js ← Source API
107
359
  ├── index.d.ts ← TypeScript definitions
108
360
  └── scripts/
109
- └── build-api.js ← Transforms build/ → dist/
361
+ └── build-api.js ← Build script
110
362
  ```
111
363
 
112
- **Invariants:**
113
- - Tests import source (`index.js`) → loads from `build/`
114
- - Published package exports `dist/` only
115
- - `build/` = internal, `dist/` = publishable
364
+ ## ❓ FAQ
365
+
366
+ **Q: Do configs and dicts auto-load or do I need to download them?**
367
+
368
+ A: Auto-load! The high-level API (`OpenCC.Converter()`) automatically downloads everything from CDN.
369
+
370
+ **Q: Does it re-download every time?**
371
+
372
+ A: No! Resources are cached after first load.
373
+
374
+ **Q: Works offline?**
375
+
376
+ A: Yes! If installed via npm, all resources are bundled. For browsers, use Service Worker for offline caching.
377
+
378
+ **Q: Which method to use: `config` or `from`/`to`?**
379
+
380
+ A: Both work identically. Use `config` if you know OpenCC config names, or `from`/`to` for locale-based approach.
381
+
382
+ **Q: Why is the first conversion slow?**
383
+
384
+ A: Initial load downloads configs + dicts (~1-2MB). Subsequent conversions are fast (cached).
385
+
386
+ ## 📝 Notes
387
+
388
+ - Uses persistent OpenCC handles to avoid reloading configs
389
+ - Dictionaries stored in `/data/dict/` in virtual FS
390
+ - Memory grows on demand (`ALLOW_MEMORY_GROWTH=1`)
391
+ - Performance: Focuses on fidelity and compatibility with official OpenCC. May be slower than pure-JS implementations for raw throughput, but guarantees full OpenCC behavior.
392
+
393
+ ## 📜 Changelog
394
+
395
+ ### 0.4.0 - 2026-01-04
396
+
397
+ **Added:**
398
+ - `config` parameter in `Converter()` for direct OpenCC config names
399
+ - New CN Government Standard conversions: `t2cngov`, `t2cngov_keep_simp`
400
+ - New demo page and regression tests for new configs
401
+
402
+ **Fixed:**
403
+ - s2twp duplication bug (issue #950)
404
+ - tw2sp `方程式` conversion regression and dictionary sync
405
+ - Missing cngov configs/dicts in wasm-lib distribution
406
+
407
+ ### 0.3.0 - 2026-01-03
408
+
409
+ **🚨 BREAKING: New Distribution Layout**
410
+
411
+ `.wasm` files moved to be co-located with glue code:
412
+ - `dist/esm/opencc-wasm.wasm` (was: `dist/opencc-wasm.esm.wasm`)
413
+ - `dist/cjs/opencc-wasm.wasm` (was: `dist/opencc-wasm.cjs.wasm`)
414
+
415
+ **Added:**
416
+ - CDN support for direct browser usage
417
+ - Comprehensive test suite
418
+ - Auto-loading of configs and dictionaries
419
+
420
+ ### 0.2.1
421
+
422
+ - Ship both wasm filenames for compatibility
423
+
424
+ ### 0.2.0
116
425
 
117
- ## Notes
118
- - Internally uses persistent OpenCC handles (`opencc_create/convert/destroy`) to avoid reloading configs.
119
- - Dictionaries are written under `/data/dict/` in the virtual FS; configs under `/data/config/`. Paths inside configs are rewritten automatically.
120
- - Memory grows on demand (`ALLOW_MEMORY_GROWTH=1`); no native dependencies needed.
121
- - Performance note: opencc-wasm focuses on fidelity and compatibility (uses official configs and `.ocd2`, matches Node OpenCC output 1:1). Raw throughput can be slower than pure JS implementations like `opencc-js`, but the WASM version guarantees full OpenCC behavior and config coverage.
426
+ - Rebuilt from OpenCC commit [`36c7cbbc`](https://github.com/frankslin/OpenCC/commit/36c7cbbc9702d2a46a89ea7a55ff8ba5656455df)
427
+ - New dist layout with ESM/CJS separation
428
+ - Tests rewritten using `node:test`
122
429
 
123
- ## 0.2.1 changes
124
- - Ship both wasm filenames (`opencc-wasm.wasm` and `opencc-wasm.esm.wasm`) in `dist/` so either glue name resolves without patches; glues remain at `dist/esm/opencc-wasm.js` and `dist/cjs/opencc-wasm.cjs`.
430
+ ---
125
431
 
126
- ## 0.2.0 changes
127
- - Conversion rules and bundled dictionaries are rebuilt from OpenCC commit [`36c7cbbc`](https://github.com/frankslin/OpenCC/commit/36c7cbbc9702d2a46a89ea7a55ff8ba5656455df). This aligns the WASM build with the upstream configs in that revision (including updated `.ocd2` data).
128
- - Output layout now mirrors the new `dist/` structure: ESM glue under `dist/esm/`, CJS glue under `dist/cjs/`, shared `opencc-wasm.wasm` at `dist/opencc-wasm.wasm`, and configs/dicts in `dist/data/`. Adjust your bundler/static hosting paths accordingly.
129
- - Tests are rewritten to use `node:test` with data-driven cases (`test/testcases.json`) instead of ad-hoc assertions, keeping coverage aligned with upstream OpenCC fixtures.
432
+ **Made with ❤️ for the Chinese NLP community**