@dev-pi2pie/word-counter 0.1.3-canary.2 → 0.1.4-canary.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -139,6 +139,29 @@ word-counter --path ./examples/test-case-multi-files-support --keep-progress
139
139
 
140
140
  Progress is transient by default, auto-disabled for single-input runs, and suppressed in `--format raw` and `--format json`.
141
141
 
142
+ ### Batch Concurrency (`--jobs`)
143
+
144
+ Use `--jobs` to control batch concurrency:
145
+
146
+ ```bash
147
+ word-counter --path ./examples/test-case-multi-files-support --jobs 1
148
+ word-counter --path ./examples/test-case-multi-files-support --jobs 4
149
+ ```
150
+
151
+ Quick policy:
152
+
153
+ - no `--jobs` and `--jobs 1` are equivalent baseline behavior.
154
+ - `--jobs > 1` enables concurrent `load+count`.
155
+ - if requested `--jobs` exceeds host `suggestedMaxJobs` (from `--print-jobs-limit`), the CLI warns and runs with the suggested limit as a safety cap.
156
+
157
+ Inspect host jobs diagnostics:
158
+
159
+ ```bash
160
+ word-counter --print-jobs-limit
161
+ ```
162
+
163
+ For full policy details, JSON parity expectations (`--misc`, `--total-of whitespace,words`), and benchmark standards, see [`docs/batch-jobs-usage-guide.md`](docs/batch-jobs-usage-guide.md).
164
+
142
165
  ### Stable Path Resolution Contract
143
166
 
144
167
  - Repeated `--path` values are accepted as mixed inputs (file + directory).
@@ -591,27 +614,10 @@ Example JSON (trimmed):
591
614
 
592
615
  ## Locale Tag Detection Notes
593
616
 
594
- - Detection is regex/script based (Unicode script checks), not a statistical language-ID model.
595
- - Ambiguous Latin text uses `und-Latn` unless a Latin hint is provided.
596
- - Han-script fallback uses `und-Hani` by default because regex script checks cannot natively distinguish `zh-Hans` vs `zh-Hant`.
597
- - Current built-in Latin diacritic heuristics include:
598
- - `de`: `äöüÄÖÜß`
599
- - `es`: `ñÑ¿¡`
600
- - `pt`: `ãõÃÕ`
601
- - `fr`: `œŒæÆ`
602
- - `pl`: `ąćęłńśźżĄĆĘŁŃŚŹŻ`
603
- - `tr`: `ıİğĞşŞ`
604
- - `ro`: `ăĂâÂîÎșȘțȚ`
605
- - `hu`: `őŐűŰ`
606
- - `is`: `ðÐþÞ`
607
- - Latin text with other European diacritics may still remain in `und-Latn` unless a hint is provided.
608
- - Use `--mode chunk`/`--mode segments` or `--format json` to see the exact locale tag assigned to each chunk.
609
- - Regex/script-only detection cannot reliably identify English vs. other Latin-script languages; 100% certainty requires explicit metadata (document language tags, user-provided locale, headers) or a language-ID model.
610
- - Use `--latin-language <tag>` or `--latin-tag <tag>` for ambiguous Latin text.
611
- - Use `--latin-hint <tag>=<pattern>` (repeatable) and `--latin-hints-file <path>` to add custom Latin rules.
612
- - Use `--no-default-latin-hints` to disable built-in Latin diacritic rules.
613
- - Use `--han-language <tag>` or `--han-tag <tag>` for Han-script fallback.
614
- - `--latin-locale` remains supported as a legacy alias for now and is planned for future deprecation.
617
+ - Detection is regex/script based, not statistical language-ID.
618
+ - Ambiguous Latin defaults to `und-Latn`; Han fallback defaults to `und-Hani`.
619
+ - Use explicit tag and hint flags when you need deterministic tagging.
620
+ - Full notes (built-in heuristics, limitations, and override guidance) are tracked in `docs/locale-tag-detection-notes.md`.
615
621
 
616
622
  ## Breaking Changes Notes
617
623