glin-profanity 2.3.8 → 3.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +197 -0
- package/dist/chunk-KNHWF6MX.js +5050 -0
- package/dist/index.cjs +2041 -272
- package/dist/index.d.cts +252 -87
- package/dist/index.d.ts +252 -87
- package/dist/index.js +50 -3306
- package/dist/ml/index.cjs +5513 -0
- package/dist/ml/index.d.cts +357 -0
- package/dist/ml/index.d.ts +357 -0
- package/dist/ml/index.js +557 -0
- package/dist/types-BgQe4FSE.d.cts +350 -0
- package/dist/types-BgQe4FSE.d.ts +350 -0
- package/package.json +38 -3
package/README.md
CHANGED
|
@@ -28,6 +28,9 @@
|
|
|
28
28
|
<a href="https://www.npmjs.com/package/glin-profanity">
|
|
29
29
|
<img src="https://img.shields.io/npm/dw/glin-profanity" alt="Weekly Downloads" />
|
|
30
30
|
</a>
|
|
31
|
+
<a href="https://pepy.tech/projects/glin-profanity">
|
|
32
|
+
<img src="https://static.pepy.tech/personalized-badge/glin-profanity?period=total&units=international_system&left_color=black&right_color=green&left_text=Python%20Downloads" alt="PyPI Downloads" />
|
|
33
|
+
</a>
|
|
31
34
|
<a href="https://github.com/GLINCKER/glin-profanity/issues">
|
|
32
35
|
<img src="https://img.shields.io/github/issues/GLINCKER/glin-profanity" alt="Open Issues" />
|
|
33
36
|
</a>
|
|
@@ -81,8 +84,29 @@ Whether you're moderating chat messages, community forums, or content input form
|
|
|
81
84
|
<img src="https://img.shields.io/badge/Real--Time-⚡-yellow?style=for-the-badge" alt="Real-Time" />
|
|
82
85
|
<img src="https://img.shields.io/badge/Obfuscation_Detection-🕵️-purple?style=for-the-badge" alt="Obfuscation" />
|
|
83
86
|
<img src="https://img.shields.io/badge/Framework_Agnostic-🧩-green?style=for-the-badge" alt="Framework Agnostic" />
|
|
87
|
+
<img src="https://img.shields.io/badge/ML_Powered-🤖-orange?style=for-the-badge" alt="ML Powered" />
|
|
84
88
|
</div>
|
|
85
89
|
|
|
90
|
+
### 💡 Why glin-profanity?
|
|
91
|
+
|
|
92
|
+
| | |
|
|
93
|
+
|---|---|
|
|
94
|
+
| 🔒 **Privacy First** | Runs entirely on-device. No API calls, no data leaves your app. GDPR/CCPA friendly. |
|
|
95
|
+
| ⚡ **Blazing Fast** | 23K-115K ops/sec rule-based, 21M+ ops/sec with caching. Sub-millisecond latency. |
|
|
96
|
+
| 🌍 **Truly Multilingual** | 23 languages with unified dictionary. Consistent detection across locales. |
|
|
97
|
+
| 🛡️ **Evasion Resistant** | Catches leetspeak (`f4ck`), Unicode tricks (`fυck`), zero-width chars, and homoglyphs. |
|
|
98
|
+
| 🤖 **AI-Ready** | Optional ML integration for context-aware toxicity detection beyond keywords. |
|
|
99
|
+
| 🧩 **Zero Config** | Works out of the box. No API keys, no server, no setup required. |
|
|
100
|
+
| 📦 **Lightweight** | ~90KB core bundle. Tree-shakeable. No heavy dependencies for basic usage. |
|
|
101
|
+
|
|
102
|
+
### ✨ What's New in v3.0
|
|
103
|
+
|
|
104
|
+
- **Leetspeak Detection** — Catch `f4ck`, `@ss`, `$h!t` with 3 intensity levels
|
|
105
|
+
- **Unicode Normalization** — Detect Cyrillic/Greek lookalikes, full-width chars, zero-width spaces
|
|
106
|
+
- **Result Caching** — 800x speedup for repeated checks
|
|
107
|
+
- **ML Integration** — Optional TensorFlow.js toxicity model for nuanced detection
|
|
108
|
+
- **Performance** — Optimized for high-throughput production workloads
|
|
109
|
+
|
|
86
110
|
## 📚 Table of Contents
|
|
87
111
|
|
|
88
112
|
- [🚀 Key Features](#-key-features)
|
|
@@ -104,6 +128,13 @@ Whether you're moderating chat messages, community forums, or content input form
|
|
|
104
128
|
- [Return Value](#return-value)
|
|
105
129
|
- [⚠️ Note](#note)
|
|
106
130
|
- [🛠 Use Cases](#-use-cases)
|
|
131
|
+
- [🔬 Advanced Features](#-advanced-features)
|
|
132
|
+
- [Leetspeak Detection](#leetspeak-detection)
|
|
133
|
+
- [Unicode Normalization](#unicode-normalization)
|
|
134
|
+
- [Result Caching](#result-caching)
|
|
135
|
+
- [Configuration Management](#configuration-management)
|
|
136
|
+
- [ML-Based Detection](#ml-based-detection)
|
|
137
|
+
- [📊 Benchmarks](#-benchmarks)
|
|
107
138
|
- [📄 License](#license)
|
|
108
139
|
- [MIT License](#mit-license)
|
|
109
140
|
|
|
@@ -357,6 +388,11 @@ new Filter(config?: FilterConfig);
|
|
|
357
388
|
| `autoReplace` | `boolean` | Whether to auto-replace flagged words |
|
|
358
389
|
| `minSeverity` | `SeverityLevel` | Minimum severity to include in final list |
|
|
359
390
|
| `customActions` | `(result) => void` | Custom logging/callback support |
|
|
391
|
+
| `detectLeetspeak` | `boolean` | Enable leetspeak detection (e.g., `f4ck` → `fuck`) |
|
|
392
|
+
| `leetspeakLevel` | `'basic' \| 'moderate' \| 'aggressive'` | Leetspeak detection intensity |
|
|
393
|
+
| `normalizeUnicode` | `boolean` | Enable Unicode normalization for homoglyphs |
|
|
394
|
+
| `cacheResults` | `boolean` | Cache results for repeated checks |
|
|
395
|
+
| `maxCacheSize` | `number` | Maximum cache size (default: 1000) |
|
|
360
396
|
|
|
361
397
|
---
|
|
362
398
|
|
|
@@ -419,6 +455,167 @@ const { result, checkText, checkTextAsync, reset, isDirty, isWordProfane } = use
|
|
|
419
455
|
- 🕹️ Game lobbies & multiplayer chats
|
|
420
456
|
- 🤖 AI content filters before processing input
|
|
421
457
|
|
|
458
|
+
## 🔬 Advanced Features
|
|
459
|
+
|
|
460
|
+
### Leetspeak Detection
|
|
461
|
+
|
|
462
|
+
Detect and normalize leetspeak variations like `f4ck`, `@ss`, `$h!t`:
|
|
463
|
+
|
|
464
|
+
```typescript
|
|
465
|
+
import { Filter } from 'glin-profanity';
|
|
466
|
+
|
|
467
|
+
const filter = new Filter({
|
|
468
|
+
languages: ['english'],
|
|
469
|
+
detectLeetspeak: true,
|
|
470
|
+
leetspeakLevel: 'moderate', // 'basic' | 'moderate' | 'aggressive'
|
|
471
|
+
});
|
|
472
|
+
|
|
473
|
+
filter.isProfane('f4ck'); // true
|
|
474
|
+
filter.isProfane('@ss'); // true
|
|
475
|
+
filter.isProfane('$h!t'); // true
|
|
476
|
+
filter.isProfane('f u c k'); // true (spaced characters)
|
|
477
|
+
```
|
|
478
|
+
|
|
479
|
+
**Leetspeak Levels:**
|
|
480
|
+
- `basic`: Numbers only (0→o, 1→i, 3→e, 4→a, 5→s)
|
|
481
|
+
- `moderate`: Basic + common symbols (@→a, $→s, !→i)
|
|
482
|
+
- `aggressive`: All known substitutions including rare ones
|
|
483
|
+
|
|
484
|
+
### Unicode Normalization
|
|
485
|
+
|
|
486
|
+
Detect homoglyphs and Unicode obfuscation:
|
|
487
|
+
|
|
488
|
+
```typescript
|
|
489
|
+
import { Filter } from 'glin-profanity';
|
|
490
|
+
|
|
491
|
+
const filter = new Filter({
|
|
492
|
+
languages: ['english'],
|
|
493
|
+
normalizeUnicode: true, // enabled by default
|
|
494
|
+
});
|
|
495
|
+
|
|
496
|
+
// Detects various Unicode tricks:
|
|
497
|
+
filter.isProfane('fυck'); // true (Greek upsilon υ → u)
|
|
498
|
+
filter.isProfane('fᴜck'); // true (Small caps ᴜ → u)
|
|
499
|
+
filter.isProfane('fuck'); // true (Zero-width spaces removed)
|
|
500
|
+
filter.isProfane('fuck'); // true (Full-width characters)
|
|
501
|
+
```
|
|
502
|
+
|
|
503
|
+
### Result Caching
|
|
504
|
+
|
|
505
|
+
Enable caching for high-performance repeated checks:
|
|
506
|
+
|
|
507
|
+
```typescript
|
|
508
|
+
import { Filter } from 'glin-profanity';
|
|
509
|
+
|
|
510
|
+
const filter = new Filter({
|
|
511
|
+
languages: ['english'],
|
|
512
|
+
cacheResults: true,
|
|
513
|
+
maxCacheSize: 1000, // LRU eviction when full
|
|
514
|
+
});
|
|
515
|
+
|
|
516
|
+
// First call computes result
|
|
517
|
+
filter.checkProfanity('hello world'); // ~0.04ms
|
|
518
|
+
|
|
519
|
+
// Subsequent calls return cached result
|
|
520
|
+
filter.checkProfanity('hello world'); // ~0.00005ms (800x faster!)
|
|
521
|
+
|
|
522
|
+
// Cache management
|
|
523
|
+
console.log(filter.getCacheSize()); // 1
|
|
524
|
+
filter.clearCache();
|
|
525
|
+
```
|
|
526
|
+
|
|
527
|
+
### Configuration Management
|
|
528
|
+
|
|
529
|
+
Export and import filter configurations for sharing between environments:
|
|
530
|
+
|
|
531
|
+
```typescript
|
|
532
|
+
import { Filter } from 'glin-profanity';
|
|
533
|
+
|
|
534
|
+
const filter = new Filter({
|
|
535
|
+
languages: ['english', 'spanish'],
|
|
536
|
+
detectLeetspeak: true,
|
|
537
|
+
leetspeakLevel: 'aggressive',
|
|
538
|
+
cacheResults: true,
|
|
539
|
+
});
|
|
540
|
+
|
|
541
|
+
// Export configuration
|
|
542
|
+
const config = filter.getConfig();
|
|
543
|
+
// Save to file: fs.writeFileSync('filter.config.json', JSON.stringify(config));
|
|
544
|
+
|
|
545
|
+
// Later, restore configuration
|
|
546
|
+
// const savedConfig = JSON.parse(fs.readFileSync('filter.config.json'));
|
|
547
|
+
// const restoredFilter = new Filter(savedConfig);
|
|
548
|
+
|
|
549
|
+
// Get dictionary size for monitoring
|
|
550
|
+
console.log(filter.getWordCount()); // 406
|
|
551
|
+
```
|
|
552
|
+
|
|
553
|
+
### ML-Based Detection
|
|
554
|
+
|
|
555
|
+
Optional TensorFlow.js-powered toxicity detection for context-aware filtering:
|
|
556
|
+
|
|
557
|
+
```bash
|
|
558
|
+
# Install optional dependencies
|
|
559
|
+
npm install @tensorflow/tfjs @tensorflow-models/toxicity
|
|
560
|
+
```
|
|
561
|
+
|
|
562
|
+
```typescript
|
|
563
|
+
import { HybridFilter } from 'glin-profanity/ml';
|
|
564
|
+
|
|
565
|
+
const filter = new HybridFilter({
|
|
566
|
+
languages: ['english'],
|
|
567
|
+
detectLeetspeak: true,
|
|
568
|
+
enableML: true,
|
|
569
|
+
mlThreshold: 0.85,
|
|
570
|
+
combinationMode: 'or', // 'or' | 'and' | 'ml-override' | 'rules-first'
|
|
571
|
+
});
|
|
572
|
+
|
|
573
|
+
// Initialize ML model (async)
|
|
574
|
+
await filter.initialize();
|
|
575
|
+
|
|
576
|
+
// Hybrid check (rules + ML)
|
|
577
|
+
const result = await filter.checkProfanityAsync('you are terrible');
|
|
578
|
+
console.log(result.isToxic); // true
|
|
579
|
+
console.log(result.mlResult?.matchedCategories); // ['insult', 'toxicity']
|
|
580
|
+
console.log(result.confidence); // 0.92
|
|
581
|
+
|
|
582
|
+
// Sync rule-based check (fast, no ML)
|
|
583
|
+
filter.isProfane('badword'); // true
|
|
584
|
+
```
|
|
585
|
+
|
|
586
|
+
**ML Categories Detected:**
|
|
587
|
+
- `toxicity` - General toxic content
|
|
588
|
+
- `insult` - Insults and personal attacks
|
|
589
|
+
- `threat` - Threatening language
|
|
590
|
+
- `obscene` - Obscene/vulgar content
|
|
591
|
+
- `identity_attack` - Identity-based hate
|
|
592
|
+
- `sexual_explicit` - Sexually explicit content
|
|
593
|
+
- `severe_toxicity` - Highly toxic content
|
|
594
|
+
|
|
595
|
+
## 📊 Benchmarks
|
|
596
|
+
|
|
597
|
+
Performance benchmarks on a MacBook Pro (M1):
|
|
598
|
+
|
|
599
|
+
| Operation | Throughput | Average Time |
|
|
600
|
+
|-----------|------------|--------------|
|
|
601
|
+
| `isProfane` (clean text) | 23,524 ops/sec | 0.04ms |
|
|
602
|
+
| `isProfane` (profane text) | 114,666 ops/sec | 0.009ms |
|
|
603
|
+
| With leetspeak detection | 22,904 ops/sec | 0.04ms |
|
|
604
|
+
| With Unicode normalization | 24,058 ops/sec | 0.04ms |
|
|
605
|
+
| With caching (cached hit) | **21,396,095 ops/sec** | 0.00005ms |
|
|
606
|
+
| `checkProfanity` (detailed) | 3,677 ops/sec | 0.27ms |
|
|
607
|
+
| Multi-language (4 langs) | 24,855 ops/sec | 0.04ms |
|
|
608
|
+
| All languages (23 langs) | 14,114 ops/sec | 0.07ms |
|
|
609
|
+
|
|
610
|
+
**Key Findings:**
|
|
611
|
+
- Leetspeak and Unicode normalization add minimal overhead
|
|
612
|
+
- Caching provides **800x speedup** for repeated checks
|
|
613
|
+
- Multi-language support scales well
|
|
614
|
+
|
|
615
|
+
Run benchmarks yourself:
|
|
616
|
+
```bash
|
|
617
|
+
npm run benchmark
|
|
618
|
+
```
|
|
422
619
|
|
|
423
620
|
## License
|
|
424
621
|
|