bekindprofanityfilter 0.0.2 → 0.0.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +69 -66
- package/dist/cjs/index.js +55795 -0
- package/dist/cjs/package.json +1 -0
- package/dist/esm/algos/aho-corasick.js.map +1 -0
- package/dist/esm/algos/bloom-filter.js.map +1 -0
- package/dist/esm/algos/context-patterns.js.map +1 -0
- package/dist/esm/index.js.map +1 -0
- package/dist/esm/innocence-scoring.js.map +1 -0
- package/dist/esm/language-detector.js.map +1 -0
- package/dist/esm/language-dicts.js.map +1 -0
- package/dist/esm/languages/arabic-words.js.map +1 -0
- package/dist/esm/languages/bengali-words.js.map +1 -0
- package/dist/esm/languages/brazilian-words.js.map +1 -0
- package/dist/{languages → esm/languages}/chinese-words.js.map +1 -1
- package/dist/{languages → esm/languages}/english-primary-all-languages.js.map +1 -1
- package/dist/esm/languages/english-words.js.map +1 -0
- package/dist/esm/languages/french-words.js.map +1 -0
- package/dist/esm/languages/german-words.js.map +1 -0
- package/dist/esm/languages/hindi-words.js.map +1 -0
- package/dist/esm/languages/innocent-words.js.map +1 -0
- package/dist/esm/languages/italian-words.js.map +1 -0
- package/dist/esm/languages/japanese-words.js.map +1 -0
- package/dist/{languages → esm/languages}/korean-words.js.map +1 -1
- package/dist/esm/languages/russian-words.js.map +1 -0
- package/dist/esm/languages/spanish-words.js.map +1 -0
- package/dist/esm/languages/tamil-words.js.map +1 -0
- package/dist/esm/languages/telugu-words.js.map +1 -0
- package/dist/esm/romanization-detector.js.map +1 -0
- package/package.json +18 -5
- package/dist/algos/aho-corasick.js.map +0 -1
- package/dist/algos/bloom-filter.js.map +0 -1
- package/dist/algos/context-patterns.js.map +0 -1
- package/dist/index.js.map +0 -1
- package/dist/innocence-scoring.js.map +0 -1
- package/dist/language-detector.js.map +0 -1
- package/dist/language-dicts.js.map +0 -1
- package/dist/languages/arabic-words.js.map +0 -1
- package/dist/languages/bengali-words.js.map +0 -1
- package/dist/languages/brazilian-words.js.map +0 -1
- package/dist/languages/english-words.js.map +0 -1
- package/dist/languages/french-words.js.map +0 -1
- package/dist/languages/german-words.js.map +0 -1
- package/dist/languages/hindi-words.js.map +0 -1
- package/dist/languages/innocent-words.js.map +0 -1
- package/dist/languages/italian-words.js.map +0 -1
- package/dist/languages/japanese-words.js.map +0 -1
- package/dist/languages/russian-words.js.map +0 -1
- package/dist/languages/spanish-words.js.map +0 -1
- package/dist/languages/tamil-words.js.map +0 -1
- package/dist/languages/telugu-words.js.map +0 -1
- package/dist/romanization-detector.js.map +0 -1
- /package/dist/{algos → esm/algos}/aho-corasick.d.ts +0 -0
- /package/dist/{algos → esm/algos}/aho-corasick.js +0 -0
- /package/dist/{algos → esm/algos}/bloom-filter.d.ts +0 -0
- /package/dist/{algos → esm/algos}/bloom-filter.js +0 -0
- /package/dist/{algos → esm/algos}/context-patterns.d.ts +0 -0
- /package/dist/{algos → esm/algos}/context-patterns.js +0 -0
- /package/dist/{index.d.ts → esm/index.d.ts} +0 -0
- /package/dist/{index.js → esm/index.js} +0 -0
- /package/dist/{innocence-scoring.d.ts → esm/innocence-scoring.d.ts} +0 -0
- /package/dist/{innocence-scoring.js → esm/innocence-scoring.js} +0 -0
- /package/dist/{language-detector.d.ts → esm/language-detector.d.ts} +0 -0
- /package/dist/{language-detector.js → esm/language-detector.js} +0 -0
- /package/dist/{language-dicts.d.ts → esm/language-dicts.d.ts} +0 -0
- /package/dist/{language-dicts.js → esm/language-dicts.js} +0 -0
- /package/dist/{languages → esm/languages}/arabic-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/arabic-words.js +0 -0
- /package/dist/{languages → esm/languages}/bengali-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/bengali-words.js +0 -0
- /package/dist/{languages → esm/languages}/brazilian-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/brazilian-words.js +0 -0
- /package/dist/{languages → esm/languages}/chinese-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/chinese-words.js +0 -0
- /package/dist/{languages → esm/languages}/english-primary-all-languages.d.ts +0 -0
- /package/dist/{languages → esm/languages}/english-primary-all-languages.js +0 -0
- /package/dist/{languages → esm/languages}/english-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/english-words.js +0 -0
- /package/dist/{languages → esm/languages}/french-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/french-words.js +0 -0
- /package/dist/{languages → esm/languages}/german-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/german-words.js +0 -0
- /package/dist/{languages → esm/languages}/hindi-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/hindi-words.js +0 -0
- /package/dist/{languages → esm/languages}/innocent-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/innocent-words.js +0 -0
- /package/dist/{languages → esm/languages}/italian-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/italian-words.js +0 -0
- /package/dist/{languages → esm/languages}/japanese-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/japanese-words.js +0 -0
- /package/dist/{languages → esm/languages}/korean-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/korean-words.js +0 -0
- /package/dist/{languages → esm/languages}/russian-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/russian-words.js +0 -0
- /package/dist/{languages → esm/languages}/spanish-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/spanish-words.js +0 -0
- /package/dist/{languages → esm/languages}/tamil-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/tamil-words.js +0 -0
- /package/dist/{languages → esm/languages}/telugu-words.d.ts +0 -0
- /package/dist/{languages → esm/languages}/telugu-words.js +0 -0
- /package/dist/{romanization-detector.d.ts → esm/romanization-detector.d.ts} +0 -0
- /package/dist/{romanization-detector.js → esm/romanization-detector.js} +0 -0
package/README.md
CHANGED
|
@@ -193,71 +193,72 @@ const filter = new BeKind({
|
|
|
193
193
|
});
|
|
194
194
|
```
|
|
195
195
|
|
|
196
|
-
###
|
|
196
|
+
### Alternative Library Comparison
|
|
197
|
+
|
|
198
|
+
The main strength of be-kind comes from its dictionary and knowledge base. To give a fair comparison, **all benchmarks below inject be-kind's full 34K-word dictionary into every alternative library**, so the results compare **matching engines and detection features**, not dictionary coverage.
|
|
197
199
|
|
|
198
200
|
Benchmarked on a single CPU core (pinned via `taskset -c 0`). All numbers are **ops/second — higher is better**.
|
|
199
201
|
|
|
200
|
-
>
|
|
202
|
+
> [leo-profanity](https://github.com/jojoee/leo-profanity) ships with ~400 English words, [bad-words](https://github.com/web-mech/badwords) ships with ~400 English words, and [glin-profanity](https://www.glincker.com/tools/glin-profanity) loads its own 24-language dictionaries — all receive be-kind's 34K dictionary on top.
|
|
201
203
|
|
|
202
204
|
| Library | Languages (out-of-the-box) | Leet-speak | Repeat compression | Context-aware |
|
|
203
205
|
|---------|--------------------------|-----------|-------------------|--------------|
|
|
204
206
|
| **be-kind** | 16 profanity dicts + 18-lang detection trie | ✅ | 🚧 planned | ✅ (certainty-delta) |
|
|
205
207
|
| **be-kind (ctx)** | same as be-kind | ✅ | 🚧 planned | ✅ (boosters + reducers) |
|
|
206
208
|
| [leo-profanity](https://github.com/jojoee/leo-profanity) + dict | 16 (via be-kind dict injection) | ❌ | ❌ | ❌ |
|
|
207
|
-
| [bad-words](https://github.com/web-mech/badwords) | 1 (English) | ❌ | ❌ | ❌ |
|
|
208
209
|
| [bad-words](https://github.com/web-mech/badwords) + dict | 16 (via be-kind dict injection) | ❌ | ❌ | ❌ |
|
|
209
|
-
| [glin-profanity](https://www.glincker.com/tools/glin-profanity) | 24 | ✅ (3 levels) | ✅ | ✅ (heuristic) |
|
|
210
|
+
| [glin-profanity](https://www.glincker.com/tools/glin-profanity) + dict | 24 + be-kind dict | ✅ (3 levels) | ✅ | ✅ (heuristic) |
|
|
210
211
|
|
|
211
|
-
**Speed benchmark** — ops/second on a single CPU core (`taskset -c 0`), higher is better:
|
|
212
|
+
**Speed benchmark** — ops/second on a single CPU core (`taskset -c 0`), higher is better. All competitors have be-kind's 34K dictionary injected:
|
|
212
213
|
|
|
213
|
-
| Test | be-kind | be-kind (ctx) | leo
|
|
214
|
-
|
|
215
|
-
| check — clean (short) | 2,
|
|
216
|
-
| check — profane (short) | 2,
|
|
217
|
-
| check — leet-speak | 1,
|
|
218
|
-
| clean — profane (short) | 2,
|
|
219
|
-
| check — 500-char clean |
|
|
220
|
-
| check — 500-char profane |
|
|
221
|
-
| check — 2,500-char clean |
|
|
222
|
-
| check — 2,500-char profane |
|
|
214
|
+
| Test | be-kind | be-kind (ctx) | leo + dict | bad-words + dict | glin (basic) | glin (enhanced) |
|
|
215
|
+
|------|--------:|--------------:|-----------:|-----------------:|-------------:|----------------:|
|
|
216
|
+
| check — clean (short) | 2,625 | 3,007 | 932,597 | 29 | 68 | 68 |
|
|
217
|
+
| check — profane (short) | 2,556 | 2,251 | 1,424,984 | 27 | 3,602 | 3,333 |
|
|
218
|
+
| check — leet-speak | 1,407 | 1,324 | 1,540,700 | 26 | 2,791 | 4,350 |
|
|
219
|
+
| clean — profane (short) | 2,499 | 2,243 | 372,049 | 2 | N/A | N/A |
|
|
220
|
+
| check — 500-char clean | 409 | 427 | 110,318 | 17 | 21 | 22 |
|
|
221
|
+
| check — 500-char profane | 357 | 314 | 217,347 | 17 | 828 | 718 |
|
|
222
|
+
| check — 2,500-char clean | 88 | 90 | 21,727 | 10 | 6 | 6 |
|
|
223
|
+
| check — 2,500-char profane | 79 | 69 | 47,966 | 9 | 192 | 165 |
|
|
223
224
|
|
|
224
225
|
**Library versions tested:** `leo-profanity@1.9.0`, `bad-words@4.0.0`, `glin-profanity@3.3.0`
|
|
225
226
|
|
|
226
227
|
**Notes:**
|
|
227
|
-
- **
|
|
228
|
-
-
|
|
229
|
-
- `
|
|
230
|
-
- `
|
|
231
|
-
- `
|
|
232
|
-
-
|
|
228
|
+
- **All competitors have be-kind's 34K dictionary injected** to isolate matching-engine performance from dictionary coverage.
|
|
229
|
+
- **be-kind** is **~39x faster than glin** on clean short text (2,625 vs 68 ops/s) with the same vocabulary. be-kind uses a **trie** (O(input_length) matching), while glin uses **linear scanning** (`for (const word of this.words.keys())` — O(dict_size * input_length)).
|
|
230
|
+
- `be-kind (ctx)` adds ~10-15% overhead over default be-kind — context analysis (certainty-delta pattern matching) is cheap.
|
|
231
|
+
- `leo + dict` is the fastest by a large margin but offers **no leet-speak, no context analysis, and no repeat compression** — it's a simple substring matcher. Its speed advantage comes from a flat array lookup with no normalization overhead.
|
|
232
|
+
- `bad-words + dict` demonstrates the regex bottleneck catastrophically: 29 ops/s on clean short text vs 2,625 for be-kind — a **~90x slowdown**. bad-words creates a new `RegExp` per word in a `.filter()` loop ([source](https://github.com/web-mech/badwords/blob/master/src/badwords.ts#L91-L103)) — no short-circuiting, so clean and profane text perform identically (~27 ops/s). `clean()` drops to 2 ops/s (vs 2,499 for be-kind). This makes bad-words unsuitable for large multilingual dictionaries.
|
|
233
|
+
- **glin with dict** collapses to 68 ops/s on clean short text (vs 2,625 for be-kind) — a **~39x slowdown** — demonstrating the linear-scan bottleneck at scale. glin short-circuits on first match, which explains the ~53x speedup on profane text (3,602 ops/s) vs clean text (68 ops/s).
|
|
233
234
|
- be-kind is the only library with cross-language innocence scoring, romanization support, and context-aware certainty adjustment.
|
|
234
235
|
|
|
235
|
-
Run the benchmark yourself:
|
|
236
|
+
Run the speed benchmark yourself:
|
|
236
237
|
```bash
|
|
237
238
|
taskset -c 0 bun run benchmark:competitors
|
|
238
239
|
```
|
|
239
240
|
|
|
240
241
|
### Accuracy Comparison
|
|
241
242
|
|
|
242
|
-
Measures TP rate (recall), FP rate, and F1 across eight test categories (225 labeled cases, dataset v6). All libraries are tested against all categories — no exemptions. **Higher F1 and lower FP rate are better.**
|
|
243
|
+
Measures TP rate (recall), FP rate, and F1 across eight test categories (225 labeled cases, dataset v6). All alternative libraries have be-kind's 34K dictionary injected. All libraries are tested against all categories — no exemptions. **Higher F1 and lower FP rate are better.**
|
|
243
244
|
|
|
244
245
|
> **Bias disclaimer:** This dataset was created by the be-kind team. Non-English cases were likely drawn from or verified against be-kind's own dictionary, which advantages be-kind on those categories. To partially offset this, the dataset includes independent test cases from [glin-profanity's upstream test suite](https://github.com/GLINCKER/glin-profanity/tree/release/tests) and adversarial false-positive cases specifically chosen to expose known be-kind failures. We strongly recommend running this benchmark against your own dataset before drawing conclusions.
|
|
245
246
|
|
|
246
|
-
> **Note:** `be-kind (sensitive)` = `sensitiveMode: true` (flags AMBIVALENT words too). `be-kind (ctx)` = `contextAnalysis.enabled: true`. `glin (collapsed)` = glin (basic) with `collapseRepeatedCharacters()` pre-processing.
|
|
247
|
+
> **Note:** `be-kind (sensitive)` = `sensitiveMode: true` (flags AMBIVALENT words too). `be-kind (ctx)` = `contextAnalysis.enabled: true`. `glin (collapsed) + dict` = glin (basic) + dict with `collapseRepeatedCharacters()` pre-processing. All alternative libraries have be-kind's 34K dictionary injected.
|
|
247
248
|
|
|
248
249
|
#### Single-language detection — 65 cases (English incl. leetspeak, French, German, Spanish, Hindi)
|
|
249
250
|
|
|
250
251
|
| Library | Recall | Precision | FP Rate | F1 |
|
|
251
252
|
|---|---|---|---|---|
|
|
252
253
|
| be-kind (sensitive) | 100% | 100% | 0% | **1.00** |
|
|
254
|
+
| bad-words + dict | 88% | 100% | 0% | 0.94 |
|
|
255
|
+
| glin (enhanced) + dict | 88% | 100% | 0% | 0.94 |
|
|
256
|
+
| glin (collapsed) + dict | 86% | 100% | 0% | 0.92 |
|
|
253
257
|
| leo + dict | 82% | 100% | 0% | 0.90 |
|
|
254
258
|
| be-kind | 80% | 100% | 0% | 0.89 |
|
|
255
259
|
| be-kind (ctx) | 80% | 100% | 0% | 0.89 |
|
|
256
|
-
| glin (enhanced) | 72% | 100% | 0% | 0.84 |
|
|
257
|
-
| glin (collapsed) | 72% | 100% | 0% | 0.84 |
|
|
258
|
-
| bad-words | 52% | 100% | 0% | 0.68 |
|
|
259
260
|
|
|
260
|
-
>
|
|
261
|
+
> With be-kind's 34K dictionary injected, all alternatives improve dramatically. `bad-words + dict` and `glin (enhanced) + dict` both reach 88% recall (up from 52% and 72% without dict). be-kind in default mode misses mild words (`damn`, `hell`); `sensitiveMode: true` catches these. All libraries achieve 100% precision — when they flag something, it's always correct.
|
|
261
262
|
|
|
262
263
|
#### False positives / innocent words — 48 cases (clean only, lower FP rate is better)
|
|
263
264
|
|
|
@@ -265,15 +266,15 @@ Includes adversarial cases (`cum laude`, `Dick Van Dyke`, culinary `faggots`, Tu
|
|
|
265
266
|
|
|
266
267
|
| Library | FP Rate |
|
|
267
268
|
|---|---|
|
|
268
|
-
|
|
|
269
|
-
|
|
|
270
|
-
| be-kind (ctx) | 21% |
|
|
271
|
-
| bad-words | 23% |
|
|
272
|
-
| leo + dict | 25% |
|
|
269
|
+
| leo + dict | **25%** |
|
|
270
|
+
| be-kind (ctx) | **25%** |
|
|
273
271
|
| be-kind | 27% |
|
|
274
272
|
| be-kind (sensitive) | 31% |
|
|
273
|
+
| glin (enhanced) + dict | 31% |
|
|
274
|
+
| glin (collapsed) + dict | 31% |
|
|
275
|
+
| bad-words + dict | 33% |
|
|
275
276
|
|
|
276
|
-
>
|
|
277
|
+
> With the full 34K dictionary injected, glin and bad-words now produce more false positives than before — their FP rates rise to 31-33% due to the larger vocabulary. `be-kind (ctx)` ties with `leo + dict` for the lowest FP rate (25%) thanks to context-aware certainty adjustment. be-kind's FP rate remains a significant weakness, but context analysis helps.
|
|
277
278
|
|
|
278
279
|
#### Multi-language detection — 26 cases (Hinglish, French, German, Spanish, mixed)
|
|
279
280
|
|
|
@@ -282,40 +283,40 @@ Includes adversarial cases (`cum laude`, `Dick Van Dyke`, culinary `faggots`, Tu
|
|
|
282
283
|
| be-kind | 100% | 100% | 0% | **1.00** |
|
|
283
284
|
| be-kind (sensitive) | 100% | 100% | 0% | **1.00** |
|
|
284
285
|
| leo + dict | 100% | 100% | 0% | **1.00** |
|
|
285
|
-
|
|
|
286
|
-
| glin (enhanced) |
|
|
287
|
-
|
|
|
288
|
-
|
|
|
286
|
+
| bad-words + dict | 100% | 100% | 0% | **1.00** |
|
|
287
|
+
| glin (enhanced) + dict | 100% | 100% | 0% | **1.00** |
|
|
288
|
+
| be-kind (ctx) | 100% | 100% | 0% | **1.00** |
|
|
289
|
+
| glin (collapsed) + dict | 100% | 100% | 0% | **1.00** |
|
|
289
290
|
|
|
290
|
-
> With be-kind's dictionary injected,
|
|
291
|
+
> With be-kind's 34K dictionary injected, **every library achieves 100% recall** — proving the dictionary is the sole differentiator for multi-language detection. The matching engine doesn't matter when the vocabulary is comprehensive enough.
|
|
291
292
|
|
|
292
293
|
#### Romanization — 30 cases (Hinglish, Bengali, Tamil, Telugu, Japanese)
|
|
293
294
|
|
|
294
295
|
| Library | Recall | Precision | FP Rate | F1 |
|
|
295
296
|
|---|---|---|---|---|
|
|
297
|
+
| glin (enhanced) + dict | 85% | 81% | 40% | **0.83** |
|
|
296
298
|
| leo + dict | 75% | 94% | 10% | **0.83** |
|
|
297
299
|
| be-kind | 80% | 84% | 30% | 0.82 |
|
|
298
300
|
| be-kind (sensitive) | 80% | 84% | 30% | 0.82 |
|
|
299
301
|
| be-kind (ctx) | 80% | 84% | 30% | 0.82 |
|
|
300
|
-
|
|
|
301
|
-
| glin (collapsed) |
|
|
302
|
-
| bad-words | 0% | 0% | 10% | — |
|
|
302
|
+
| bad-words + dict | 80% | 84% | 30% | 0.82 |
|
|
303
|
+
| glin (collapsed) + dict | 80% | 84% | 30% | 0.82 |
|
|
303
304
|
|
|
304
|
-
>
|
|
305
|
+
> With dict injection, `glin (enhanced) + dict` achieves the **highest recall** (85%) on romanization — glin's leet-speak detection catches additional transliterated variants. However, its FP rate (40%) is also the highest. `leo + dict` achieves the same F1 (0.83) with much better precision (94%) and lowest FP (10%). be-kind, bad-words + dict, and glin (collapsed) + dict all tie at 80% recall / 30% FP / F1=0.82, showing that the dictionary drives most romanization detection — not the matching engine.
|
|
305
306
|
|
|
306
307
|
#### Semantic context — 25 cases
|
|
307
308
|
|
|
308
309
|
| Library | Recall | Precision | FP Rate | F1 |
|
|
309
310
|
|---|---|---|---|---|
|
|
310
|
-
| be-kind (ctx) | 80% | 73% | 20% | **0.76** |
|
|
311
311
|
| leo + dict | 100% | 59% | 47% | 0.74 |
|
|
312
|
-
|
|
|
313
|
-
| glin (collapsed) | 90% | 53% | 53% | 0.67 |
|
|
312
|
+
| bad-words + dict | 100% | 48% | 73% | 0.65 |
|
|
314
313
|
| be-kind (sensitive) | 100% | 48% | 73% | 0.65 |
|
|
315
|
-
|
|
|
314
|
+
| glin (enhanced) + dict | 100% | 48% | 73% | 0.65 |
|
|
315
|
+
| glin (collapsed) + dict | 100% | 48% | 73% | 0.65 |
|
|
316
|
+
| be-kind (ctx) | 80% | 62% | 47% | 0.64 |
|
|
316
317
|
| be-kind | 80% | 47% | 60% | 0.59 |
|
|
317
318
|
|
|
318
|
-
> Semantic context is where all libraries struggle — precision drops below 50% for most. Cases include metalinguistic uses, negation, and medical context. be-kind (ctx)
|
|
319
|
+
> Semantic context is where all libraries struggle — precision drops below 50% for most. Cases include metalinguistic uses, negation, and medical context. With dict injection, bad-words + dict and glin now achieve 100% recall but at the cost of 73% FP rate. `be-kind (ctx)` trades lower recall (80%) for better precision (62%) and a lower FP rate (47%) via context-aware certainty adjustment — boosters confirm profane intent, reducers detect innocent contexts like proper nouns and medical terms.
|
|
319
320
|
|
|
320
321
|
#### Repeated character evasion — 5 cases (elongated profanity)
|
|
321
322
|
|
|
@@ -323,13 +324,13 @@ No clean cases in this category — FP rate is undefined.
|
|
|
323
324
|
|
|
324
325
|
| Library | Recall | Precision |
|
|
325
326
|
|---|---|---|
|
|
326
|
-
| glin (enhanced) | **100%** | 100% |
|
|
327
|
-
| glin (collapsed) | 40% | 100% |
|
|
327
|
+
| glin (enhanced) + dict | **100%** | 100% |
|
|
328
|
+
| glin (collapsed) + dict | 40% | 100% |
|
|
328
329
|
| be-kind | 0% | — |
|
|
329
330
|
| be-kind (sensitive) | 0% | — |
|
|
330
331
|
| be-kind (ctx) | 0% | — |
|
|
331
332
|
| leo + dict | 0% | — |
|
|
332
|
-
| bad-words | 0% | — |
|
|
333
|
+
| bad-words + dict | 0% | — |
|
|
333
334
|
|
|
334
335
|
#### Concatenated / no-space evasion — 7 cases (profanity embedded in concatenated strings)
|
|
335
336
|
|
|
@@ -338,10 +339,10 @@ No clean cases in this category — FP rate is undefined.
|
|
|
338
339
|
| be-kind | 20% | 100% | 0% | 0.33 |
|
|
339
340
|
| be-kind (sensitive) | 20% | 100% | 0% | 0.33 |
|
|
340
341
|
| be-kind (ctx) | 20% | 100% | 0% | 0.33 |
|
|
342
|
+
| bad-words + dict | 20% | 100% | 0% | 0.33 |
|
|
343
|
+
| glin (enhanced) + dict | 20% | 100% | 0% | 0.33 |
|
|
344
|
+
| glin (collapsed) + dict | 20% | 100% | 0% | 0.33 |
|
|
341
345
|
| leo + dict | 0% | — | 0% | — |
|
|
342
|
-
| bad-words | 0% | — | 0% | — |
|
|
343
|
-
| glin (enhanced) | 0% | — | 0% | — |
|
|
344
|
-
| glin (collapsed) | 0% | — | 0% | — |
|
|
345
346
|
|
|
346
347
|
#### Challenge cases — 19 cases (semantic disambiguation, embedded substrings, separator evasion)
|
|
347
348
|
|
|
@@ -349,29 +350,31 @@ Hard problems: `cock` as rooster, `ass` as donkey, Turkish `got` = "buttocks" vs
|
|
|
349
350
|
|
|
350
351
|
| Library | Recall | Precision | FP Rate | F1 |
|
|
351
352
|
|---|---|---|---|---|
|
|
352
|
-
| be-kind (ctx) | 60% | 75% |
|
|
353
|
+
| be-kind (ctx) | 60% | 75% | 33% | **0.63** |
|
|
353
354
|
| be-kind | 60% | 60% | 44% | 0.60 |
|
|
354
355
|
| be-kind (sensitive) | 60% | 60% | 44% | 0.60 |
|
|
355
|
-
| glin (enhanced) |
|
|
356
|
+
| glin (enhanced) + dict | 60% | 60% | 44% | 0.60 |
|
|
357
|
+
| bad-words + dict | 50% | 56% | 44% | 0.53 |
|
|
358
|
+
| glin (collapsed) + dict | 50% | 56% | 44% | 0.53 |
|
|
356
359
|
| leo + dict | 20% | 50% | 22% | 0.29 |
|
|
357
|
-
| bad-words | 20% | 33% | 44% | 0.25 |
|
|
358
|
-
| glin (collapsed) | 0% | 0% | 44% | — |
|
|
359
360
|
|
|
360
|
-
> be-kind (ctx)
|
|
361
|
+
> be-kind (ctx) achieves the best F1 on challenge cases thanks to context-aware certainty adjustment — recognizing innocent contexts like "cock crowed at dawn" and "wild ass is an equine." With dict injection, glin (enhanced) + dict now matches be-kind's recall (60%) but at higher FP (44% vs 33%). Separator-spaced evasion cases (`f u c k`, `f_u*c k`, mixed separators) test features that no alternative library supports. These cases still require semantic understanding that no dictionary-based filter can fully solve — the strongest argument for LLM-assisted moderation as a second pass.
|
|
361
362
|
|
|
362
363
|
#### Overall summary — micro-averaged across all 225 cases
|
|
363
364
|
|
|
365
|
+
All alternative libraries have be-kind's 34K dictionary injected.
|
|
366
|
+
|
|
364
367
|
| Library | Recall | Precision | FP Rate | F1 | TP | FN | FP | TN |
|
|
365
368
|
|---|---|---|---|---|---|---|---|---|
|
|
366
|
-
| be-kind (sensitive) | **86%** | 76% | 32% | 0.81 | 104 | 17 | 33 | 71 |
|
|
367
|
-
|
|
|
369
|
+
| be-kind (sensitive) | **86%** | 76% | 32% | **0.81** | 104 | 17 | 33 | 71 |
|
|
370
|
+
| glin (enhanced) + dict | **86%** | 75% | 33% | 0.80 | 104 | 17 | 34 | 70 |
|
|
371
|
+
| glin (collapsed) + dict | 81% | 75% | 32% | 0.78 | 98 | 23 | 33 | 71 |
|
|
372
|
+
| bad-words + dict | 80% | 74% | 33% | 0.77 | 97 | 24 | 34 | 70 |
|
|
373
|
+
| leo + dict | 74% | 80% | 21% | 0.77 | 89 | 32 | 22 | 82 |
|
|
374
|
+
| be-kind (ctx) | 76% | **79%** | **24%** | 0.77 | 92 | 29 | 25 | 79 |
|
|
368
375
|
| be-kind | 76% | 76% | 28% | 0.76 | 92 | 29 | 29 | 75 |
|
|
369
|
-
| leo + dict | 74% | 80% | 21% | 0.76 | 89 | 32 | 22 | 82 |
|
|
370
|
-
| glin (enhanced) | 63% | 78% | 21% | 0.70 | 76 | 45 | 22 | 82 |
|
|
371
|
-
| glin (collapsed) | 58% | 77% | 20% | 0.66 | 70 | 51 | 21 | 83 |
|
|
372
|
-
| bad-words | 42% | 65% | 26% | 0.51 | 51 | 70 | 27 | 77 |
|
|
373
376
|
|
|
374
|
-
> Micro-averaged: all 225 cases (121 profane, 104 clean) aggregated into one confusion matrix per library, then recall/precision/F1 computed once. No category weighting artifacts.
|
|
377
|
+
> Micro-averaged: all 225 cases (121 profane, 104 clean) aggregated into one confusion matrix per library, then recall/precision/F1 computed once. No category weighting artifacts. With be-kind's dictionary injected, **glin (enhanced) + dict matches be-kind (sensitive) on recall (86%)** and nearly matches on F1 (0.80 vs 0.81) — proving the dictionary is the core differentiator, not the matching engine. `leo + dict` and `be-kind (ctx)` tie for best precision (79-80%) and lowest FP rates (21-24%). be-kind (ctx) achieves this through context-aware certainty adjustment; leo achieves it through simpler matching that avoids over-triggering.
|
|
375
378
|
|
|
376
379
|
Run the accuracy benchmark yourself:
|
|
377
380
|
```bash
|