@dev-pi2pie/word-counter 0.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 dev-pi2pie
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,222 @@
1
+ # Word Counter
2
+
3
+ Locale-aware word counting powered by the Web API [`Intl.Segmenter`](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter). The script automatically detects the primary writing system for each portion of the input, segments the text with the matching locale, and reports word totals per language.
4
+
5
+ ## How It Works
6
+
7
+ - The runtime inspects each character's Unicode script to infer its likely locale (e.g., `und-Latn`, `zh-Hans`, `ja`).
8
+ - Adjacent characters that share the same locale are grouped into a chunk.
9
+ - Each chunk is counted with `Intl.Segmenter` at `granularity: "word"`, caching segmenters to avoid re-instantiation.
10
+ - Per-locale counts are summed into a overall total and printed to stdout.
11
+
12
+ ## Installation
13
+
14
+ ### For Development
15
+
16
+ Clone the repository and set up locally:
17
+
18
+ ```bash
19
+ git clone https://github.com/dev-pi2pie/word-counter.git
20
+ cd word-counter
21
+ bun install
22
+ bun run build
23
+ npm link
24
+ ```
25
+
26
+ After linking, you can use the `word-counter` command globally:
27
+
28
+ ```bash
29
+ word-counter "Hello 世界 안녕"
30
+ ```
31
+
32
+ To use the linked package inside another project:
33
+
34
+ ```bash
35
+ npm link @dev-pi2pie/word-counter
36
+ ```
37
+
38
+ To uninstall the global link:
39
+
40
+ ```bash
41
+ npm unlink --global @dev-pi2pie/word-counter
42
+ ```
43
+
44
+ ### From npm Registry (npmjs.com)
45
+
46
+ ```bash
47
+ npm install -g @dev-pi2pie/word-counter@latest
48
+ ```
49
+
50
+ ### From GitHub Packages
51
+
52
+ If your scope is configured to use GitHub Packages:
53
+
54
+ ```bash
55
+ # ~/.npmrc
56
+ @dev-pi2pie:registry=https://npm.pkg.github.com
57
+ ```
58
+
59
+ ```bash
60
+ npm install -g @dev-pi2pie/word-counter@latest
61
+ ```
62
+
63
+ If your scope is configured to use npmjs instead, the same scoped package name
64
+ will resolve from npmjs.com (see the npm registry section above).
65
+
66
+ > [!note]
67
+ > **npm** may show newer releases (for example, `v0.0.6`) while GitHub Packages still lists `v0.0.5`.
68
+ > This is historical; releases kept in sync starting with `v0.0.6`.
69
+
70
+ ## Usage
71
+
72
+ Once installed (via `npm link`, npm registry, or GitHub Packages), you can use the CLI directly:
73
+
74
+ ```bash
75
+ word-counter "Hello 世界 안녕"
76
+ ```
77
+
78
+ Alternatively, run the built CLI with Node:
79
+
80
+ ```bash
81
+ node dist/esm/bin.mjs "Hello 世界 안녕"
82
+ ```
83
+
84
+ You can also pipe text:
85
+
86
+ ```bash
87
+ echo "こんにちは world مرحبا" | word-counter
88
+ ```
89
+
90
+ Or read from a file:
91
+
92
+ ```bash
93
+ word-counter --path ./fixtures/sample.txt
94
+ ```
95
+
96
+ ## Library Usage
97
+
98
+ The package exports can be used after installing from GitHub Packages or linking locally with `npm link`.
99
+
100
+ ### ESM
101
+
102
+ ```js
103
+ import wordCounter, { countWordsForLocale, segmentTextByLocale } from "@dev-pi2pie/word-counter";
104
+ ```
105
+
106
+ ### CJS
107
+
108
+ ```js
109
+ const wordCounter = require("@dev-pi2pie/word-counter");
110
+ const { countWordsForLocale, segmentTextByLocale, showSingularOrPluralWord } = wordCounter;
111
+ ```
112
+
113
+ ### Display Modes
114
+
115
+ Choose a breakdown style with `--mode` (or `-m`):
116
+
117
+ - `chunk` (default) – list each contiguous locale block in order of appearance.
118
+ - `segments` – show the actual wordlike segments used for counting.
119
+ - `collector` – aggregate counts per locale regardless of text position.
120
+
121
+ Examples:
122
+
123
+ ```bash
124
+ # chunk mode (default)
125
+ word-counter "飛鳥 bird 貓 cat; how do you do?"
126
+
127
+ # show captured segments
128
+ word-counter --mode segments "飛鳥 bird 貓 cat; how do you do?"
129
+
130
+ # aggregate per locale
131
+ word-counter -m collector "飛鳥 bird 貓 cat; how do you do?"
132
+ ```
133
+
134
+ ### Section Modes (Frontmatter)
135
+
136
+ Use `--section` to control which parts of a markdown document are counted:
137
+
138
+ - `all` (default) – count the whole file (fast path, no section split).
139
+ - `split` – count frontmatter and content separately.
140
+ - `frontmatter` – count frontmatter only.
141
+ - `content` – count content only.
142
+ - `per-key` – count frontmatter per key (frontmatter only).
143
+ - `split-per-key` – per-key frontmatter counts plus a content total.
144
+
145
+ Supported frontmatter formats:
146
+
147
+ - YAML fenced with `---`
148
+ - TOML fenced with `+++`
149
+ - JSON fenced with `;;;` or a top-of-file JSON object (`{ ... }`)
150
+
151
+ Examples:
152
+
153
+ ```bash
154
+ word-counter --section split -p examples/yaml-basic.md
155
+ word-counter --section per-key -p examples/yaml-basic.md
156
+ word-counter --section split-per-key -p examples/yaml-basic.md
157
+ ```
158
+
159
+ JSON output includes a `source` field (`frontmatter` or `content`) to avoid key collisions:
160
+
161
+ ```bash
162
+ word-counter --section split-per-key --format json -p examples/yaml-content-key.md
163
+ ```
164
+
165
+ Example (trimmed):
166
+
167
+ ```json
168
+ {
169
+ "section": "split-per-key",
170
+ "frontmatterType": "yaml",
171
+ "total": 7,
172
+ "items": [
173
+ { "name": "content", "source": "frontmatter", "result": { "total": 3 } },
174
+ { "name": "content", "source": "content", "result": { "total": 4 } }
175
+ ]
176
+ }
177
+ ```
178
+
179
+ ### Output Formats
180
+
181
+ Select how results are printed with `--format`:
182
+
183
+ - `standard` (default) – total plus per-locale breakdown.
184
+ - `raw` – only the total count (single number).
185
+ - `json` – machine-readable output; add `--pretty` for indentation.
186
+
187
+ Examples:
188
+
189
+ ```bash
190
+ word-counter --format raw "Hello world"
191
+ word-counter --format json --pretty "Hello world"
192
+ ```
193
+
194
+ ## Locale Detection Notes (Migration)
195
+
196
+ - Ambiguous Latin text now uses `und-Latn` instead of defaulting to `en`.
197
+ - Use `--mode chunk`/`--mode segments` or `--format json` to see the exact locale assigned to each chunk.
198
+ - Regex/script-only detection cannot reliably identify English vs. other Latin-script languages; 100% certainty requires explicit metadata (document language tags, user-provided locale, headers) or a language-ID model.
199
+
200
+ ## Testing
201
+
202
+ Run the build before tests so the CJS interop test can load the emitted
203
+ `dist/cjs/index.cjs` bundle:
204
+
205
+ ```bash
206
+ bun run build
207
+ bun test
208
+ ```
209
+
210
+ ## Sample Inputs
211
+
212
+ Try the following mixed-locale phrases to see how detection behaves:
213
+
214
+ - `"Hello world 你好世界"`
215
+ - `"Bonjour le monde こんにちは 세계"`
216
+ - `"¡Hola! مرحبا Hello"`
217
+
218
+ Each run prints the total word count plus a per-locale breakdown, helping you understand how multilingual text is segmented.
219
+
220
+ ## License
221
+
222
+ This project is licensed under the MIT License — see the [LICENSE](LICENSE) file for details.