@dev-pi2pie/word-counter 0.1.5-canary.3 → 0.1.5-canary.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -101,18 +101,94 @@ Enable the optional WASM detector for ambiguous Latin and Han routes:
101
101
  ```bash
102
102
  word-counter --detector wasm "This sentence should clearly be detected as English for the wasm detector path."
103
103
  word-counter --detector wasm "漢字測試需要更多內容才能觸發偵測"
104
+ word-counter --detector wasm --content-gate strict "Internationalization documentation remains understandable."
105
+ word-counter --detector wasm --content-gate loose "四字成語"
106
+ word-counter --detector wasm --content-gate off "mode: debug\ntee: true\npath: logs\nUse this for testing."
107
+ ```
108
+
109
+ Inspect detector behavior without count output:
110
+
111
+ ```bash
112
+ word-counter inspect "こんにちは、世界!これはテストです。"
113
+ word-counter inspect --view engine "This sentence should clearly be detected as English for the wasm detector path."
114
+ word-counter inspect --detector regex -f json "こんにちは、世界!これはテストです。"
115
+ word-counter inspect --detector regex -f json --pretty "こんにちは、世界!これはテストです。"
116
+ word-counter inspect --detector wasm --content-gate off "mode: debug\ntee: true\npath: logs\nUse this for testing."
117
+ word-counter inspect -p ./examples/yaml-basic.md
118
+ word-counter inspect -p ./examples/test-case-multi-files-support
119
+ word-counter inspect -p ./examples/test-case-multi-files-support --section content -f json --pretty
104
120
  ```
105
121
 
106
122
  Detector mode notes:
107
123
 
108
124
  - `--detector regex` is the default behavior.
109
125
  - `--detector wasm` only runs for ambiguous `und-Latn` and `und-Hani` chunks.
126
+ - `--content-gate default|strict|loose|off` configures the shared detector policy mode used by the WASM detector path.
127
+ - `default`: current fixture-backed project policy
128
+ - `strict`: raises detector eligibility thresholds and makes more borderline windows fall back
129
+ - `loose`: lowers detector eligibility thresholds and makes more borderline windows eligible or upgradable
130
+ - `off`: bypasses `contentGate` evaluation only
131
+ - mode behavior differs by route:
132
+ - `und-Latn`: `default|strict|loose` affect both eligibility and the Latin prose-style `contentGate`
133
+ - `und-Hani`: `default|strict|loose` affect eligibility only, while `contentGate` still reports `policy=none`
134
+ - current Hani behavior:
135
+ - `default`: keeps the current Hani diagnostic-sample threshold
136
+ - `strict`: raises the Hani diagnostic-sample threshold
137
+ - `loose`: uses a short-window Han-focused threshold so idiom-length samples such as `四字成語` can become eligible
138
+ - `off`: keeps the same Hani eligibility thresholds as `default`
110
139
  - `--detector regex` keeps the original script/regex chunk-first detection path.
111
140
  - `--detector wasm` uses a detector-oriented ambiguous-window scoring pass before accepted tags are projected back onto the counting chunks.
112
141
  - In `--detector wasm` mode, Latin hint rules and explicit Latin hint flags are deferred until after detector evaluation and only relabel unresolved `und-Latn` output.
113
142
  - Very short chunks stay on the original `und-*` fallback.
114
143
  - Low-confidence or unsupported detector results fall back to `und-*`.
115
144
  - Technical-noise-heavy Latin windows stay conservative and may remain `und-Latn` even when the detector produces a wrong-but-confident language guess.
145
+ - inspect/debug disclosure uses `contentGate` as the canonical gate field.
146
+ - legacy debug/evidence payloads still emit `qualityGate` as a compatibility alias derived from `contentGate.passed`.
147
+ - for practical verification, use `inspect` to compare direct mode outcomes across `default`, `strict`, `loose`, and `off`; use `--debug --detector-evidence` when you specifically need counting-flow event details or legacy `qualityGate` compatibility
148
+ - `word-counter inspect` supports:
149
+ - positional text input
150
+ - one direct `-p, --path <file>` input
151
+ - repeated `-p, --path` inputs for batch inspect
152
+ - directory inputs in default `--path-mode auto`
153
+ - literal file-only path handling in `--path-mode manual`
154
+ - `--section all|frontmatter|content`
155
+ - batch inspect keeps counting-style path acquisition but not counting aggregation:
156
+ - no inspect `--merged`
157
+ - no inspect `--per-file`
158
+ - no inspect `--jobs`
159
+
160
+ ### Detector Subpath (`@dev-pi2pie/word-counter/detector`)
161
+
162
+ Use the detector subpath when you need async detector-aware APIs directly in library code.
163
+
164
+ ```ts
165
+ import {
166
+ inspectTextWithDetector,
167
+ segmentTextByLocaleWithDetector,
168
+ wordCounterWithDetector,
169
+ } from "@dev-pi2pie/word-counter/detector";
170
+
171
+ const inspectResult = await inspectTextWithDetector("こんにちは、世界!これはテストです。", {
172
+ detector: "wasm",
173
+ view: "pipeline",
174
+ });
175
+ const countResult = await wordCounterWithDetector(
176
+ "Internationalization documentation remains understandable.",
177
+ {
178
+ detector: "wasm",
179
+ contentGate: { mode: "strict" },
180
+ },
181
+ );
182
+ ```
183
+
184
+ Detector subpath notes:
185
+
186
+ - detector entrypoints are async
187
+ - use the root package for normal counting when you do not need detector-specific control
188
+ - detector-subpath APIs that execute detector policy also accept:
189
+ - `contentGate: { mode: "default" | "strict" | "loose" | "off" }`
190
+ - use `detectorDebug` for counting-flow runtime diagnostics
191
+ - use `inspectTextWithDetector()` for direct detector diagnosis as structured data
116
192
 
117
193
  Collect non-words (emoji/symbols/punctuation):
118
194
 
@@ -500,6 +576,7 @@ Import from `@dev-pi2pie/word-counter/detector` for the explicit detector-enable
500
576
  | `wordCounterWithDetector` | function | Async detector-aware counting entrypoint. |
501
577
  | `segmentTextByLocaleWithDetector` | function | Async detector-aware locale segmentation. |
502
578
  | `countSectionsWithDetector` | function | Async detector-aware section counting. |
579
+ | `inspectTextWithDetector` | function | Async detector-aware inspect entrypoint. |
503
580
  | `DEFAULT_DETECTOR_MODE` | value | Current default detector mode (`regex`). |
504
581
  | `DETECTOR_MODES` | value | Supported detector modes. |
505
582