thai-cut-browser 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -71,6 +71,94 @@ const segmented = wordcut.cutIntoArray("ฉันชอบกินข้าว
71
71
  ```
72
72
 
73
73
 
74
+ Asynchronous API
75
+ ----------------
76
+
77
+ The package provides an async entry point at `thai-cut-browser/async` that lets you initialize a segmentation instance from a dictionary fetched at runtime. This keeps the 500 KB+ embedded dictionary out of your main JavaScript chunk — the dictionary is loaded on demand and can be cached independently by your CDN or service worker.
78
+
79
+ ```ts
80
+ import { createWordcutAsync } from "thai-cut-browser/async";
81
+ ```
82
+
83
+ `createWordcutAsync(options)` returns a `Promise<WordcutInstance>`. The resolved instance exposes the same `cut`, `cutIntoArray`, and `cutIntoRanges` methods as the synchronous API and produces identical segmentation results for the same dictionary and input.
84
+
85
+ ### Options (`CreateWordcutAsyncOptions`)
86
+
87
+ | Field | Type | Description |
88
+ |---|---|---|
89
+ | `dictionarySource` | `DictionarySource` | The words to load. Required unless `noDict` is `true`. |
90
+ | `additionalWords` | `string[]` | Optional extra words merged into the dictionary after the source resolves. |
91
+ | `noDict` | `boolean` | If `true`, build a dictionary-free instance. Mutually exclusive with `dictionarySource`. |
92
+
93
+ ### `DictionarySource` variants
94
+
95
+ The `dictionarySource` field accepts three forms:
96
+
97
+ - **`string[]`** — a pre-loaded array of words. Useful when you already have the dictionary in memory.
98
+ - **`Promise<string[]>`** — a promise that resolves to the word array. Ideal for wrapping a `fetch` call.
99
+ - **`() => Promise<string[]>`** — a factory function returning a promise. Invoked exactly once, synchronously within the `createWordcutAsync` call. Use this when you want lazy evaluation (the fetch only happens when segmentation is actually needed).
100
+
101
+ ### Error handling
102
+
103
+ `createWordcutAsync` rejects the returned promise in the following cases:
104
+
105
+ - **Missing options** — if neither `dictionarySource` nor `noDict: true` is provided, the promise rejects with an error indicating that exactly one must be supplied.
106
+ - **Mutually exclusive options** — if both `dictionarySource` and `noDict: true` are provided, the promise rejects with an error indicating they are mutually exclusive.
107
+ - **Invalid word arrays** — if the resolved `dictionarySource` or `additionalWords` is not an array of strings (or exceeds 1,000,000 elements), the promise rejects with a `TypeError` whose message identifies the offending option name, the zero-based index of the first non-string element, and its `typeof` result.
108
+ - **Source failures** — if a `dictionarySource` function throws synchronously or its returned promise rejects, the promise rejects with an error whose `cause` property is the original thrown/rejected value. When the original value is `undefined` or `null`, `cause` is set to an `Error` indicating the source rejected without a reason.
109
+
110
+ ### Example: fetching the dictionary asset with `fetch`
111
+
112
+ ```ts
113
+ import { createWordcutAsync, parseDictionary } from "thai-cut-browser/async";
114
+ // Bundlers resolve this subpath to a hashed asset URL at build time
115
+ import dictUrl from "thai-cut-browser/default-dict.txt";
116
+
117
+ // Browser / bundler usage — fetch the dictionary asset at runtime
118
+ const wordcut = await createWordcutAsync({
119
+ dictionarySource: async () => {
120
+ const res = await fetch(dictUrl);
121
+ if (!res.ok) throw new Error(`Failed to fetch dictionary: ${res.status}`);
122
+ const text = await res.text();
123
+ return parseDictionary(text);
124
+ },
125
+ additionalWords: ["กินข้าว"],
126
+ });
127
+
128
+ const result = wordcut.cut("ฉันชอบกินข้าว");
129
+ ```
130
+
131
+ ### Resolving the dictionary asset path
132
+
133
+ The package ships the default dictionary as a static file at `assets/default-dict.txt` inside the installed package. How you reference it depends on your environment:
134
+
135
+ **Node.js (server-side, SSR, scripts):** Call `getDefaultDictPath()` to get the absolute file path of the asset. It uses Node's `require.resolve` under the hood and throws if the asset cannot be located.
136
+
137
+ ```ts
138
+ import { getDefaultDictPath, parseDictionary } from "thai-cut-browser/async";
139
+ import { readFileSync } from "node:fs";
140
+
141
+ const dictPath = getDefaultDictPath();
142
+ const words = parseDictionary(readFileSync(dictPath, "utf8"));
143
+ ```
144
+
145
+ **Bundlers (Vite, webpack, Rollup):** Import the asset via the package subpath `thai-cut-browser/default-dict.txt`. Most bundlers resolve this to a hashed URL you can pass to `fetch`:
146
+
147
+ ```ts
148
+ import dictUrl from "thai-cut-browser/default-dict.txt";
149
+ // dictUrl is a string like "/assets/default-dict-abc123.txt" after bundling
150
+ ```
151
+
152
+ The exact import syntax depends on your bundler's asset handling (Vite resolves it automatically; webpack may need the `asset/resource` rule or a `?url` suffix).
153
+
154
+ ### Utility exports
155
+
156
+ The async entry also re-exports two utility functions for working with the dictionary file format:
157
+
158
+ - **`parseDictionary(text: string): string[]`** — parses a UTF-8 dictionary asset (one word per line, LF or CRLF terminated) into an array of words. Empty lines are skipped.
159
+ - **`serializeDictionary(words: string[]): string`** — converts a word array into the dictionary asset format (LF-terminated lines). Throws if any element is not a string or contains line terminators.
160
+
161
+
74
162
  Development
75
163
  -----------
76
164