@dev-pi2pie/word-counter 0.0.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +222 -0
- package/dist/cjs/index.cjs +754 -0
- package/dist/cjs/index.cjs.map +1 -0
- package/dist/esm/bin.mjs +979 -0
- package/dist/esm/bin.mjs.map +1 -0
- package/dist/esm/index.d.mts +75 -0
- package/dist/esm/index.mjs +742 -0
- package/dist/esm/index.mjs.map +1 -0
- package/package.json +46 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 dev-pi2pie
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,222 @@
|
|
|
1
|
+
# Word Counter
|
|
2
|
+
|
|
3
|
+
Locale-aware word counting powered by the Web API [`Intl.Segmenter`](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter). The script automatically detects the primary writing system for each portion of the input, segments the text with the matching locale, and reports word totals per language.
|
|
4
|
+
|
|
5
|
+
## How It Works
|
|
6
|
+
|
|
7
|
+
- The runtime inspects each character's Unicode script to infer its likely locale (e.g., `und-Latn`, `zh-Hans`, `ja`).
|
|
8
|
+
- Adjacent characters that share the same locale are grouped into a chunk.
|
|
9
|
+
- Each chunk is counted with `Intl.Segmenter` at `granularity: "word"`, caching segmenters to avoid re-instantiation.
|
|
10
|
+
- Per-locale counts are summed into a overall total and printed to stdout.
|
|
11
|
+
|
|
12
|
+
## Installation
|
|
13
|
+
|
|
14
|
+
### For Development
|
|
15
|
+
|
|
16
|
+
Clone the repository and set up locally:
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
git clone https://github.com/dev-pi2pie/word-counter.git
|
|
20
|
+
cd word-counter
|
|
21
|
+
bun install
|
|
22
|
+
bun run build
|
|
23
|
+
npm link
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
After linking, you can use the `word-counter` command globally:
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
word-counter "Hello 世界 안녕"
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
To use the linked package inside another project:
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
npm link @dev-pi2pie/word-counter
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
To uninstall the global link:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
npm unlink --global @dev-pi2pie/word-counter
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
### From npm Registry (npmjs.com)
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
npm install -g @dev-pi2pie/word-counter@latest
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
### From GitHub Packages
|
|
51
|
+
|
|
52
|
+
If your scope is configured to use GitHub Packages:
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
# ~/.npmrc
|
|
56
|
+
@dev-pi2pie:registry=https://npm.pkg.github.com
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
npm install -g @dev-pi2pie/word-counter@latest
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
If your scope is configured to use npmjs instead, the same scoped package name
|
|
64
|
+
will resolve from npmjs.com (see the npm registry section above).
|
|
65
|
+
|
|
66
|
+
> [!note]
|
|
67
|
+
> **npm** may show newer releases (for example, `v0.0.6`) while GitHub Packages still lists `v0.0.5`.
|
|
68
|
+
> This is historical; releases kept in sync starting with `v0.0.6`.
|
|
69
|
+
|
|
70
|
+
## Usage
|
|
71
|
+
|
|
72
|
+
Once installed (via `npm link`, npm registry, or GitHub Packages), you can use the CLI directly:
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
word-counter "Hello 世界 안녕"
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Alternatively, run the built CLI with Node:
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
node dist/esm/bin.mjs "Hello 世界 안녕"
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
You can also pipe text:
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
echo "こんにちは world مرحبا" | word-counter
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
Or read from a file:
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
word-counter --path ./fixtures/sample.txt
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
## Library Usage
|
|
97
|
+
|
|
98
|
+
The package exports can be used after installing from GitHub Packages or linking locally with `npm link`.
|
|
99
|
+
|
|
100
|
+
### ESM
|
|
101
|
+
|
|
102
|
+
```js
|
|
103
|
+
import wordCounter, { countWordsForLocale, segmentTextByLocale } from "@dev-pi2pie/word-counter";
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### CJS
|
|
107
|
+
|
|
108
|
+
```js
|
|
109
|
+
const wordCounter = require("@dev-pi2pie/word-counter");
|
|
110
|
+
const { countWordsForLocale, segmentTextByLocale, showSingularOrPluralWord } = wordCounter;
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
### Display Modes
|
|
114
|
+
|
|
115
|
+
Choose a breakdown style with `--mode` (or `-m`):
|
|
116
|
+
|
|
117
|
+
- `chunk` (default) – list each contiguous locale block in order of appearance.
|
|
118
|
+
- `segments` – show the actual wordlike segments used for counting.
|
|
119
|
+
- `collector` – aggregate counts per locale regardless of text position.
|
|
120
|
+
|
|
121
|
+
Examples:
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
# chunk mode (default)
|
|
125
|
+
word-counter "飛鳥 bird 貓 cat; how do you do?"
|
|
126
|
+
|
|
127
|
+
# show captured segments
|
|
128
|
+
word-counter --mode segments "飛鳥 bird 貓 cat; how do you do?"
|
|
129
|
+
|
|
130
|
+
# aggregate per locale
|
|
131
|
+
word-counter -m collector "飛鳥 bird 貓 cat; how do you do?"
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
### Section Modes (Frontmatter)
|
|
135
|
+
|
|
136
|
+
Use `--section` to control which parts of a markdown document are counted:
|
|
137
|
+
|
|
138
|
+
- `all` (default) – count the whole file (fast path, no section split).
|
|
139
|
+
- `split` – count frontmatter and content separately.
|
|
140
|
+
- `frontmatter` – count frontmatter only.
|
|
141
|
+
- `content` – count content only.
|
|
142
|
+
- `per-key` – count frontmatter per key (frontmatter only).
|
|
143
|
+
- `split-per-key` – per-key frontmatter counts plus a content total.
|
|
144
|
+
|
|
145
|
+
Supported frontmatter formats:
|
|
146
|
+
|
|
147
|
+
- YAML fenced with `---`
|
|
148
|
+
- TOML fenced with `+++`
|
|
149
|
+
- JSON fenced with `;;;` or a top-of-file JSON object (`{ ... }`)
|
|
150
|
+
|
|
151
|
+
Examples:
|
|
152
|
+
|
|
153
|
+
```bash
|
|
154
|
+
word-counter --section split -p examples/yaml-basic.md
|
|
155
|
+
word-counter --section per-key -p examples/yaml-basic.md
|
|
156
|
+
word-counter --section split-per-key -p examples/yaml-basic.md
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
JSON output includes a `source` field (`frontmatter` or `content`) to avoid key collisions:
|
|
160
|
+
|
|
161
|
+
```bash
|
|
162
|
+
word-counter --section split-per-key --format json -p examples/yaml-content-key.md
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
Example (trimmed):
|
|
166
|
+
|
|
167
|
+
```json
|
|
168
|
+
{
|
|
169
|
+
"section": "split-per-key",
|
|
170
|
+
"frontmatterType": "yaml",
|
|
171
|
+
"total": 7,
|
|
172
|
+
"items": [
|
|
173
|
+
{ "name": "content", "source": "frontmatter", "result": { "total": 3 } },
|
|
174
|
+
{ "name": "content", "source": "content", "result": { "total": 4 } }
|
|
175
|
+
]
|
|
176
|
+
}
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
### Output Formats
|
|
180
|
+
|
|
181
|
+
Select how results are printed with `--format`:
|
|
182
|
+
|
|
183
|
+
- `standard` (default) – total plus per-locale breakdown.
|
|
184
|
+
- `raw` – only the total count (single number).
|
|
185
|
+
- `json` – machine-readable output; add `--pretty` for indentation.
|
|
186
|
+
|
|
187
|
+
Examples:
|
|
188
|
+
|
|
189
|
+
```bash
|
|
190
|
+
word-counter --format raw "Hello world"
|
|
191
|
+
word-counter --format json --pretty "Hello world"
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
## Locale Detection Notes (Migration)
|
|
195
|
+
|
|
196
|
+
- Ambiguous Latin text now uses `und-Latn` instead of defaulting to `en`.
|
|
197
|
+
- Use `--mode chunk`/`--mode segments` or `--format json` to see the exact locale assigned to each chunk.
|
|
198
|
+
- Regex/script-only detection cannot reliably identify English vs. other Latin-script languages; 100% certainty requires explicit metadata (document language tags, user-provided locale, headers) or a language-ID model.
|
|
199
|
+
|
|
200
|
+
## Testing
|
|
201
|
+
|
|
202
|
+
Run the build before tests so the CJS interop test can load the emitted
|
|
203
|
+
`dist/cjs/index.cjs` bundle:
|
|
204
|
+
|
|
205
|
+
```bash
|
|
206
|
+
bun run build
|
|
207
|
+
bun test
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
## Sample Inputs
|
|
211
|
+
|
|
212
|
+
Try the following mixed-locale phrases to see how detection behaves:
|
|
213
|
+
|
|
214
|
+
- `"Hello world 你好世界"`
|
|
215
|
+
- `"Bonjour le monde こんにちは 세계"`
|
|
216
|
+
- `"¡Hola! مرحبا Hello"`
|
|
217
|
+
|
|
218
|
+
Each run prints the total word count plus a per-locale breakdown, helping you understand how multilingual text is segmented.
|
|
219
|
+
|
|
220
|
+
## License
|
|
221
|
+
|
|
222
|
+
This project is licensed under the MIT License — see the [LICENSE](LICENSE) file for details.
|