allprofanity 2.1.1 â 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CONTRIBUTORS.md +106 -0
- package/README.md +354 -26
- package/allprofanity.config.example.json +35 -0
- package/bin/init.js +49 -0
- package/config.schema.json +163 -0
- package/dist/algos/aho-corasick.d.ts +75 -0
- package/dist/algos/aho-corasick.js +238 -0
- package/dist/algos/aho-corasick.js.map +1 -0
- package/dist/algos/bloom-filter.d.ts +103 -0
- package/dist/algos/bloom-filter.js +208 -0
- package/dist/algos/bloom-filter.js.map +1 -0
- package/dist/algos/context-patterns.d.ts +88 -0
- package/dist/algos/context-patterns.js +298 -0
- package/dist/algos/context-patterns.js.map +1 -0
- package/dist/index.d.ts +53 -0
- package/dist/index.js +232 -8
- package/dist/index.js.map +1 -1
- package/dist/languages/brazilian-words.d.ts +7 -0
- package/dist/languages/brazilian-words.js +207 -0
- package/dist/languages/brazilian-words.js.map +1 -0
- package/package.json +23 -7
package/CONTRIBUTORS.md
ADDED
|
@@ -0,0 +1,106 @@
|
|
|
1
|
+
# Contributors
|
|
2
|
+
|
|
3
|
+
Welcome to the allProfanity project! We're excited to have you here and grateful for your interest in contributing to making profanity detection better for everyone.
|
|
4
|
+
|
|
5
|
+
## Our Contributors
|
|
6
|
+
|
|
7
|
+
Thank you to all the amazing people who have contributed to this project:
|
|
8
|
+
|
|
9
|
+
<!-- Add your name below this line -->
|
|
10
|
+
- Your Name (@your-github-username)
|
|
11
|
+
<!-- Keep the list alphabetically sorted -->
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## How to Contribute
|
|
16
|
+
|
|
17
|
+
We welcome contributions of all kinds! Here are some ways you can help:
|
|
18
|
+
|
|
19
|
+
- Add support for new languages
|
|
20
|
+
- Improve existing word lists
|
|
21
|
+
- Enhance detection algorithms
|
|
22
|
+
- Fix bugs
|
|
23
|
+
- Improve documentation
|
|
24
|
+
- Add or improve test cases
|
|
25
|
+
|
|
26
|
+
## Adding a New Language
|
|
27
|
+
|
|
28
|
+
When adding support for a new language, please follow these important guidelines:
|
|
29
|
+
|
|
30
|
+
### Required Steps:
|
|
31
|
+
|
|
32
|
+
1. **Add the word list** for the new language in the appropriate location
|
|
33
|
+
2. **Update the configuration** to include the new language
|
|
34
|
+
3. **Write comprehensive test cases** for the new language
|
|
35
|
+
- Include tests for common profane words
|
|
36
|
+
- Include tests for edge cases
|
|
37
|
+
- Include tests for false positives (words that should NOT be flagged)
|
|
38
|
+
4. **Run all tests** and ensure they pass
|
|
39
|
+
5. **Take a screenshot** of the passing tests
|
|
40
|
+
6. **Attach the screenshot** to your Pull Request
|
|
41
|
+
|
|
42
|
+
### Example Test Structure:
|
|
43
|
+
|
|
44
|
+
```typescript
|
|
45
|
+
describe('Language: YourLanguage', () => {
|
|
46
|
+
it('should detect profanity', () => {
|
|
47
|
+
// Your test cases
|
|
48
|
+
});
|
|
49
|
+
|
|
50
|
+
it('should not flag clean words', () => {
|
|
51
|
+
// Your test cases
|
|
52
|
+
});
|
|
53
|
+
});
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## Pull Request Guidelines
|
|
57
|
+
|
|
58
|
+
When submitting a Pull Request, please:
|
|
59
|
+
|
|
60
|
+
1. **Provide a clear description** of what your PR does
|
|
61
|
+
2. **Reference any related issues** (e.g., "Fixes #123")
|
|
62
|
+
3. **Include test results** - attach a screenshot showing all tests passing
|
|
63
|
+
4. **Follow the existing code style** and conventions
|
|
64
|
+
5. **Keep changes focused** - one feature/fix per PR
|
|
65
|
+
6. **Update documentation** if you're adding new features
|
|
66
|
+
|
|
67
|
+
## Testing Your Changes
|
|
68
|
+
|
|
69
|
+
Before submitting a PR:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
# Install dependencies
|
|
73
|
+
npm install
|
|
74
|
+
|
|
75
|
+
# Run tests
|
|
76
|
+
npm test
|
|
77
|
+
|
|
78
|
+
# Run linting (if applicable)
|
|
79
|
+
npm run lint
|
|
80
|
+
|
|
81
|
+
# Build the project
|
|
82
|
+
npm run build
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
Make sure all tests pass and take a screenshot to include in your PR!
|
|
86
|
+
|
|
87
|
+
## Code of Conduct
|
|
88
|
+
|
|
89
|
+
- Be respectful and inclusive
|
|
90
|
+
- Provide constructive feedback
|
|
91
|
+
- Help others learn and grow
|
|
92
|
+
- Focus on what is best for the community
|
|
93
|
+
|
|
94
|
+
## Questions or Need Help?
|
|
95
|
+
|
|
96
|
+
- Open an issue for bugs or feature requests
|
|
97
|
+
- Start a discussion for questions or ideas
|
|
98
|
+
- Check existing issues and PRs before creating new ones
|
|
99
|
+
|
|
100
|
+
## Recognition
|
|
101
|
+
|
|
102
|
+
All contributors will be recognized in this file and in our release notes. Your contributions, no matter how small, are valuable and appreciated!
|
|
103
|
+
|
|
104
|
+
---
|
|
105
|
+
|
|
106
|
+
Thank you for making allProfanity better! đ
|
package/README.md
CHANGED
|
@@ -1,26 +1,59 @@
|
|
|
1
1
|
# AllProfanity
|
|
2
2
|
|
|
3
|
-
A blazing-fast, multi-language
|
|
3
|
+
A blazing-fast, multi-language profanity filter for JavaScript/TypeScript with advanced algorithms (Aho-Corasick, Bloom Filters) delivering **664% faster performance** on large texts, intelligent leet-speak detection, and pattern-based context analysis.
|
|
4
4
|
|
|
5
5
|
[](https://www.npmjs.com/package/allprofanity)
|
|
6
6
|
[](https://opensource.org/licenses/MIT)
|
|
7
7
|
|
|
8
8
|
---
|
|
9
9
|
|
|
10
|
+
## What's New in v2.2.0
|
|
11
|
+
|
|
12
|
+
- **Aho-Corasick Algorithm:** 664% faster on large texts (1KB+) with O(n) multi-pattern matching
|
|
13
|
+
- **Bloom Filters:** Lightning-fast probabilistic lookups reduce unnecessary checks
|
|
14
|
+
- **Result Caching:** 123x speedup on repeated inputs (perfect for chat apps and forms)
|
|
15
|
+
- **Pattern-Based Context Detection:** Reduces false positives in medical/negation contexts
|
|
16
|
+
- **Word Boundary Detection:** Smart whole-word matching prevents flagging "assassin" or "assistance"
|
|
17
|
+
- **Flexible Configuration:** Choose algorithm and trade-offs based on your use case
|
|
18
|
+
|
|
19
|
+
[Read the full Performance Analysis â](./docs/SPEED_VS_ACCURACY.md)
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
10
23
|
## Features
|
|
11
24
|
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
- **Multiple
|
|
15
|
-
- **
|
|
16
|
-
- **
|
|
17
|
-
-
|
|
18
|
-
- **
|
|
19
|
-
- **
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
- **
|
|
25
|
+
### Performance & Speed
|
|
26
|
+
|
|
27
|
+
- **Multiple Algorithm Options:** Choose between Trie (default), Aho-Corasick, or Hybrid modes
|
|
28
|
+
- **664% Faster on Large Texts:** Aho-Corasick delivers O(n) multi-pattern matching
|
|
29
|
+
- **123x Speedup with Caching:** Result cache perfect for repeated checks (chat, forms, APIs)
|
|
30
|
+
- **~27K ops/sec:** Default Trie mode handles short texts incredibly fast
|
|
31
|
+
- **Single-Pass Scanning:** O(n) complexity regardless of dictionary size
|
|
32
|
+
- **Batch Processing Ready:** Optimized for high-throughput API endpoints
|
|
33
|
+
|
|
34
|
+
### Accuracy & Detection
|
|
35
|
+
|
|
36
|
+
- **Word Boundary Matching:** Smart whole-word detection prevents false positives like "assassin" or "assistance"
|
|
37
|
+
- **Pattern-Based Context Detection:** Recognizes medical terms ("anal region") and negation patterns ("not bad")
|
|
38
|
+
- **Advanced Leet-Speak:** Detects obfuscated profanities (`f#ck`, `a55hole`, `sh1t`, etc.)
|
|
39
|
+
- **Comprehensive Coverage:** Catches profanity while minimizing false flags
|
|
40
|
+
- **Configurable Strictness:** Tune detection sensitivity to your needs
|
|
41
|
+
|
|
42
|
+
### Multi-Language & Flexibility
|
|
43
|
+
|
|
44
|
+
- **Multi-Language Support:** Built-in dictionaries for English, Hindi, French, German, Spanish, Bengali, Tamil, Telugu, Brazilian Portuguese
|
|
45
|
+
- **Multiple Scripts:** Latin/Roman (Hinglish) and native scripts (Devanagari, Tamil, Telugu, etc.)
|
|
46
|
+
- **Custom Dictionaries:** Add/remove words or entire language packs at runtime
|
|
47
|
+
- **Whitelisting:** Exclude safe words from detection
|
|
48
|
+
- **Severity Scoring:** Assess content offensiveness (`MILD`, `MODERATE`, `SEVERE`, `EXTREME`)
|
|
49
|
+
|
|
50
|
+
### Developer Experience
|
|
51
|
+
|
|
52
|
+
- **TypeScript Support:** Fully typed API with comprehensive documentation
|
|
53
|
+
- **Zero 3rd-Party Dependencies:** Only internal code and data
|
|
54
|
+
- **Configurable:** Tune performance vs accuracy for your use case
|
|
55
|
+
- **No Dictionary Exposure:** Secure by design - word lists never exposed
|
|
56
|
+
- **Universal:** Works in Node.js and browsers
|
|
24
57
|
|
|
25
58
|
---
|
|
26
59
|
|
|
@@ -32,6 +65,13 @@ npm install allprofanity
|
|
|
32
65
|
yarn add allprofanity
|
|
33
66
|
```
|
|
34
67
|
|
|
68
|
+
**Generate configuration file (optional):**
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
npx allprofanity
|
|
72
|
+
# Creates allprofanity.config.json and config.schema.json in your project
|
|
73
|
+
```
|
|
74
|
+
|
|
35
75
|
---
|
|
36
76
|
|
|
37
77
|
## Quick Start
|
|
@@ -48,6 +88,162 @@ profanity.check('Ye ek chutiya test hai.'); // true (Hinglish Roman scr
|
|
|
48
88
|
|
|
49
89
|
---
|
|
50
90
|
|
|
91
|
+
## Algorithm Configuration
|
|
92
|
+
|
|
93
|
+
AllProfanity v2.2+ offers multiple algorithms optimized for different use cases. You can configure via **constructor options** or **config file**.
|
|
94
|
+
|
|
95
|
+
### Configuration Methods
|
|
96
|
+
|
|
97
|
+
#### Method 1: Constructor Options (Inline)
|
|
98
|
+
|
|
99
|
+
```typescript
|
|
100
|
+
import { AllProfanity } from 'allprofanity';
|
|
101
|
+
|
|
102
|
+
const filter = new AllProfanity({
|
|
103
|
+
algorithm: { matching: "hybrid" },
|
|
104
|
+
performance: { enableCaching: true }
|
|
105
|
+
});
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
#### Method 2: Config File (Recommended)
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
# Generate config files in your project
|
|
112
|
+
npx allprofanity
|
|
113
|
+
|
|
114
|
+
# This creates:
|
|
115
|
+
# - allprofanity.config.json (main config)
|
|
116
|
+
# - config.schema.json (for IDE autocomplete)
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
```typescript
|
|
120
|
+
import { AllProfanity } from 'allprofanity';
|
|
121
|
+
import config from './allprofanity.config.json';
|
|
122
|
+
|
|
123
|
+
// Load from generated config file
|
|
124
|
+
const filter = AllProfanity.fromConfig(config);
|
|
125
|
+
|
|
126
|
+
// Or directly from object (no file needed)
|
|
127
|
+
const filter2 = AllProfanity.fromConfig({
|
|
128
|
+
algorithm: { matching: "hybrid", useContextAnalysis: true },
|
|
129
|
+
performance: { enableCaching: true, cacheSize: 1000 }
|
|
130
|
+
});
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
**Example Config File** (`allprofanity.config.json`):
|
|
134
|
+
|
|
135
|
+
```json
|
|
136
|
+
{
|
|
137
|
+
"algorithm": {
|
|
138
|
+
"matching": "hybrid",
|
|
139
|
+
"useAhoCorasick": true,
|
|
140
|
+
"useBloomFilter": true,
|
|
141
|
+
"useContextAnalysis": true
|
|
142
|
+
},
|
|
143
|
+
"contextAnalysis": {
|
|
144
|
+
"enabled": true,
|
|
145
|
+
"contextWindow": 50,
|
|
146
|
+
"languages": ["en"],
|
|
147
|
+
"scoreThreshold": 0.5
|
|
148
|
+
},
|
|
149
|
+
"profanityDetection": {
|
|
150
|
+
"enableLeetSpeak": true,
|
|
151
|
+
"caseSensitive": false,
|
|
152
|
+
"strictMode": false
|
|
153
|
+
},
|
|
154
|
+
"performance": {
|
|
155
|
+
"enableCaching": true,
|
|
156
|
+
"cacheSize": 1000
|
|
157
|
+
}
|
|
158
|
+
}
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
**Config File:** Run `npx allprofanity` to generate config files in your project. The JSON schema provides IDE autocomplete and validation.
|
|
162
|
+
|
|
163
|
+
---
|
|
164
|
+
|
|
165
|
+
### Quick Configuration Examples
|
|
166
|
+
|
|
167
|
+
#### 1. Default (Best for General Use)
|
|
168
|
+
|
|
169
|
+
```typescript
|
|
170
|
+
import { AllProfanity } from 'allprofanity';
|
|
171
|
+
const filter = new AllProfanity();
|
|
172
|
+
// Uses optimized Trie - fast and reliable (~27K ops/sec)
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
#### 2. Large Text Processing (Documents, Articles)
|
|
176
|
+
|
|
177
|
+
```typescript
|
|
178
|
+
const filter = new AllProfanity({
|
|
179
|
+
algorithm: { matching: "aho-corasick" }
|
|
180
|
+
});
|
|
181
|
+
// 664% faster on 1KB+ texts
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
#### 3. Reduced False Positives (Social Media, Content Moderation)
|
|
185
|
+
|
|
186
|
+
```typescript
|
|
187
|
+
const filter = new AllProfanity({
|
|
188
|
+
algorithm: {
|
|
189
|
+
matching: "hybrid",
|
|
190
|
+
useBloomFilter: true,
|
|
191
|
+
useAhoCorasick: true,
|
|
192
|
+
useContextAnalysis: true
|
|
193
|
+
},
|
|
194
|
+
contextAnalysis: {
|
|
195
|
+
enabled: true,
|
|
196
|
+
contextWindow: 50,
|
|
197
|
+
languages: ["en"],
|
|
198
|
+
scoreThreshold: 0.5
|
|
199
|
+
}
|
|
200
|
+
});
|
|
201
|
+
// Pattern-based context detection reduces medical/negation false positives
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
#### 4. Repeated Checks (Chat, Forms, APIs)
|
|
205
|
+
|
|
206
|
+
```typescript
|
|
207
|
+
const filter = new AllProfanity({
|
|
208
|
+
performance: {
|
|
209
|
+
enableCaching: true,
|
|
210
|
+
cacheSize: 1000
|
|
211
|
+
}
|
|
212
|
+
});
|
|
213
|
+
// 123x speedup on cache hits
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
#### 5. Medical/Professional Content
|
|
217
|
+
|
|
218
|
+
```typescript
|
|
219
|
+
const filter = new AllProfanity({
|
|
220
|
+
algorithm: {
|
|
221
|
+
matching: "hybrid",
|
|
222
|
+
useContextAnalysis: true
|
|
223
|
+
},
|
|
224
|
+
contextAnalysis: {
|
|
225
|
+
enabled: true,
|
|
226
|
+
contextWindow: 100,
|
|
227
|
+
scoreThreshold: 0.7 // Higher threshold = less sensitive
|
|
228
|
+
}
|
|
229
|
+
});
|
|
230
|
+
// Reduces false positives from medical terms using keyword patterns
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
### Performance Characteristics
|
|
234
|
+
|
|
235
|
+
| Use Case | Algorithm | Speed | Detection | Best For |
|
|
236
|
+
|----------|-----------|-------|----------|----------|
|
|
237
|
+
| Short texts (<500 chars) | Trie (default) | ~27K ops/sec | Excellent | Chat, comments |
|
|
238
|
+
| Large texts (1KB+) | Aho-Corasick | ~9.6K ops/sec | Excellent | Documents, articles |
|
|
239
|
+
| Repeated patterns | Any + Caching | 123x faster | Excellent | Forms, validation |
|
|
240
|
+
| Content moderation | Hybrid + Context | Moderate | Good (fewer false positives) | Social media, UGC |
|
|
241
|
+
| Professional content | Hybrid + Context (strict) | Moderate | Reduced false flags | Medical, academic |
|
|
242
|
+
|
|
243
|
+
[See detailed benchmarks and comparisons â](./docs/SPEED_VS_ACCURACY.md)
|
|
244
|
+
|
|
245
|
+
---
|
|
246
|
+
|
|
51
247
|
## API Reference & Examples
|
|
52
248
|
|
|
53
249
|
### `check(text: string): boolean`
|
|
@@ -143,11 +339,11 @@ profanity.check('Esto es mierda.'); // false
|
|
|
143
339
|
Whitelist words so they are never flagged as profane.
|
|
144
340
|
|
|
145
341
|
```typescript
|
|
146
|
-
profanity.addToWhitelist(['
|
|
147
|
-
profanity.check('He is an
|
|
148
|
-
profanity.check('
|
|
342
|
+
profanity.addToWhitelist(['fuck', 'idiot','shit']);
|
|
343
|
+
profanity.check('He is an fucking idiot.'); // false
|
|
344
|
+
profanity.check('Fuck this shit.'); // false
|
|
149
345
|
// Remove from whitelist to restore detection
|
|
150
|
-
profanity.removeFromWhitelist(['
|
|
346
|
+
profanity.removeFromWhitelist(['fuck', 'idiot','shit']);
|
|
151
347
|
```
|
|
152
348
|
|
|
153
349
|
---
|
|
@@ -250,7 +446,7 @@ Returns the names of all available built-in language packs.
|
|
|
250
446
|
|
|
251
447
|
```typescript
|
|
252
448
|
console.log(profanity.getAvailableLanguages());
|
|
253
|
-
// ['english', 'hindi', 'french', 'german', 'spanish', 'bengali', 'tamil', 'telugu']
|
|
449
|
+
// ['english', 'hindi', 'french', 'german', 'spanish', 'bengali', 'tamil', 'telugu', 'brazilian']
|
|
254
450
|
```
|
|
255
451
|
|
|
256
452
|
---
|
|
@@ -289,6 +485,133 @@ console.log(profanity.getConfig());
|
|
|
289
485
|
|
|
290
486
|
---
|
|
291
487
|
|
|
488
|
+
## Configuration File Structure
|
|
489
|
+
|
|
490
|
+
AllProfanity supports JSON-based configuration for easy setup and deployment. The config file structure supports all algorithm and detection options.
|
|
491
|
+
|
|
492
|
+
### Full Configuration Schema
|
|
493
|
+
|
|
494
|
+
```typescript
|
|
495
|
+
{
|
|
496
|
+
"algorithm": {
|
|
497
|
+
"matching": "trie" | "aho-corasick" | "hybrid", // Algorithm selection
|
|
498
|
+
"useAhoCorasick": boolean, // Enable Aho-Corasick
|
|
499
|
+
"useBloomFilter": boolean, // Enable Bloom Filter
|
|
500
|
+
"useContextAnalysis": boolean // Enable context analysis
|
|
501
|
+
},
|
|
502
|
+
"bloomFilter": {
|
|
503
|
+
"enabled": boolean, // Enable/disable
|
|
504
|
+
"expectedItems": number, // Expected dictionary size (default: 10000)
|
|
505
|
+
"falsePositiveRate": number // Acceptable false positive rate (default: 0.01)
|
|
506
|
+
},
|
|
507
|
+
"ahoCorasick": {
|
|
508
|
+
"enabled": boolean, // Enable/disable
|
|
509
|
+
"prebuild": boolean // Prebuild automaton (default: true)
|
|
510
|
+
},
|
|
511
|
+
"contextAnalysis": {
|
|
512
|
+
"enabled": boolean, // Enable/disable pattern-based context detection
|
|
513
|
+
"contextWindow": number, // Characters around match to check (default: 50)
|
|
514
|
+
"languages": string[], // Languages for keyword patterns (default: ["en"])
|
|
515
|
+
"scoreThreshold": number // Detection threshold 0-1 (default: 0.5)
|
|
516
|
+
},
|
|
517
|
+
"profanityDetection": {
|
|
518
|
+
"enableLeetSpeak": boolean, // Detect l33t speak (default: true)
|
|
519
|
+
"caseSensitive": boolean, // Case sensitive matching (default: false)
|
|
520
|
+
"strictMode": boolean, // Require word boundaries (default: false)
|
|
521
|
+
"detectPartialWords": boolean, // Detect within words (default: false)
|
|
522
|
+
"defaultPlaceholder": string // Default censoring character (default: "*")
|
|
523
|
+
},
|
|
524
|
+
"performance": {
|
|
525
|
+
"enableCaching": boolean, // Enable result cache (default: false)
|
|
526
|
+
"cacheSize": number // Cache size limit (default: 1000)
|
|
527
|
+
}
|
|
528
|
+
}
|
|
529
|
+
```
|
|
530
|
+
|
|
531
|
+
### Pre-configured Templates
|
|
532
|
+
|
|
533
|
+
#### High Performance (Large Texts)
|
|
534
|
+
|
|
535
|
+
```json
|
|
536
|
+
{
|
|
537
|
+
"algorithm": { "matching": "aho-corasick" },
|
|
538
|
+
"ahoCorasick": { "enabled": true, "prebuild": true },
|
|
539
|
+
"profanityDetection": { "enableLeetSpeak": true }
|
|
540
|
+
}
|
|
541
|
+
```
|
|
542
|
+
|
|
543
|
+
#### Reduced False Positives (Content Moderation)
|
|
544
|
+
|
|
545
|
+
```json
|
|
546
|
+
{
|
|
547
|
+
"algorithm": {
|
|
548
|
+
"matching": "hybrid",
|
|
549
|
+
"useContextAnalysis": true,
|
|
550
|
+
"useBloomFilter": true
|
|
551
|
+
},
|
|
552
|
+
"contextAnalysis": {
|
|
553
|
+
"enabled": true,
|
|
554
|
+
"contextWindow": 50,
|
|
555
|
+
"scoreThreshold": 0.5
|
|
556
|
+
},
|
|
557
|
+
"performance": { "enableCaching": true }
|
|
558
|
+
}
|
|
559
|
+
```
|
|
560
|
+
|
|
561
|
+
#### Balanced (Production)
|
|
562
|
+
|
|
563
|
+
```json
|
|
564
|
+
{
|
|
565
|
+
"algorithm": {
|
|
566
|
+
"matching": "hybrid",
|
|
567
|
+
"useAhoCorasick": true,
|
|
568
|
+
"useBloomFilter": true
|
|
569
|
+
},
|
|
570
|
+
"profanityDetection": { "enableLeetSpeak": true },
|
|
571
|
+
"performance": { "enableCaching": true, "cacheSize": 1000 }
|
|
572
|
+
}
|
|
573
|
+
```
|
|
574
|
+
|
|
575
|
+
### Using Config Files
|
|
576
|
+
|
|
577
|
+
**Step 1: Generate Config Files**
|
|
578
|
+
|
|
579
|
+
```bash
|
|
580
|
+
# Run this in your project directory
|
|
581
|
+
npx allprofanity
|
|
582
|
+
|
|
583
|
+
# Output:
|
|
584
|
+
# â
AllProfanity configuration files created!
|
|
585
|
+
#
|
|
586
|
+
# Created files:
|
|
587
|
+
# đ allprofanity.config.json - Main configuration
|
|
588
|
+
# đ config.schema.json - JSON schema for IDE autocomplete
|
|
589
|
+
```
|
|
590
|
+
|
|
591
|
+
**Step 2: Load Config in Your Code**
|
|
592
|
+
|
|
593
|
+
```typescript
|
|
594
|
+
// ES Modules / TypeScript
|
|
595
|
+
import { AllProfanity } from 'allprofanity';
|
|
596
|
+
import config from './allprofanity.config.json';
|
|
597
|
+
|
|
598
|
+
const filter = AllProfanity.fromConfig(config);
|
|
599
|
+
```
|
|
600
|
+
|
|
601
|
+
```javascript
|
|
602
|
+
// CommonJS (Node.js)
|
|
603
|
+
const { AllProfanity } = require('allprofanity');
|
|
604
|
+
const config = require('./allprofanity.config.json');
|
|
605
|
+
|
|
606
|
+
const filter = AllProfanity.fromConfig(config);
|
|
607
|
+
```
|
|
608
|
+
|
|
609
|
+
**Step 3: Customize Config**
|
|
610
|
+
|
|
611
|
+
Edit `allprofanity.config.json` to enable/disable features. Your IDE will provide autocomplete thanks to the JSON schema!
|
|
612
|
+
|
|
613
|
+
---
|
|
614
|
+
|
|
292
615
|
## Severity Levels
|
|
293
616
|
|
|
294
617
|
Severity reflects the number and variety of detected profanities:
|
|
@@ -304,13 +627,14 @@ Severity reflects the number and variety of detected profanities:
|
|
|
304
627
|
|
|
305
628
|
## Language Support
|
|
306
629
|
|
|
307
|
-
- **Built-in:** English, Hindi, French, German, Spanish, Bengali, Tamil, Telugu
|
|
630
|
+
- **Built-in:** English, Hindi, French, German, Spanish, Bengali, Tamil, Telugu, Brazilian Portuguese
|
|
308
631
|
- **Scripts:** Latin/Roman, Devanagari, Tamil, Telugu, Bengali, etc.
|
|
309
632
|
- **Mixed Content:** Handles mixed-language and code-switched sentences.
|
|
310
633
|
|
|
311
634
|
```typescript
|
|
312
635
|
profanity.check('This is bullshit and ā¤āĨ⤤ā¤ŋā¤¯ā¤ž.'); // true (mixed English/Hindi)
|
|
313
636
|
profanity.check('Ce mot est merde and ā¤Ēā¤žā¤ā¤˛.'); // true (French/Hindi)
|
|
637
|
+
profanity.check('Isso Ê uma merda.'); // true (Brazilian Portuguese)
|
|
314
638
|
```
|
|
315
639
|
|
|
316
640
|
---
|
|
@@ -389,10 +713,11 @@ A: Yes! AllProfanity is universal.
|
|
|
389
713
|
|
|
390
714
|
## Roadmap
|
|
391
715
|
|
|
392
|
-
-
|
|
393
|
-
-
|
|
394
|
-
-
|
|
395
|
-
-
|
|
716
|
+
- đ§ Multi-language context analysis (Hindi, Spanish, etc.)
|
|
717
|
+
- đ§ Phonetic matching (sounds-like detection)
|
|
718
|
+
- đ§ More language packs (Arabic, Russian, Japanese, etc.)
|
|
719
|
+
- đ§ Machine learning integration for adaptive scoring
|
|
720
|
+
- đ§ Plugin system for custom detection algorithms
|
|
396
721
|
|
|
397
722
|
---
|
|
398
723
|
|
|
@@ -404,6 +729,9 @@ MIT â See [LICENSE](https://github.com/ayush-jadaun/allprofanity/blob/main/LIC
|
|
|
404
729
|
|
|
405
730
|
## Contributing
|
|
406
731
|
|
|
407
|
-
We welcome
|
|
408
|
-
|
|
409
|
-
|
|
732
|
+
We welcome contributions! Please see our [CONTRIBUTORS.md](./CONTRIBUTORS.md) for:
|
|
733
|
+
|
|
734
|
+
- How to add your name to our contributors list
|
|
735
|
+
- Guidelines for adding new languages
|
|
736
|
+
- Test requirements (must include passing test screenshots in PRs)
|
|
737
|
+
- Code of conduct and PR guidelines
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
{
|
|
2
|
+
"$schema": "./config.schema.json",
|
|
3
|
+
"algorithm": {
|
|
4
|
+
"matching": "hybrid",
|
|
5
|
+
"useAhoCorasick": true,
|
|
6
|
+
"useBloomFilter": true,
|
|
7
|
+
"useContextAnalysis": true
|
|
8
|
+
},
|
|
9
|
+
"bloomFilter": {
|
|
10
|
+
"enabled": true,
|
|
11
|
+
"expectedItems": 10000,
|
|
12
|
+
"falsePositiveRate": 0.01
|
|
13
|
+
},
|
|
14
|
+
"ahoCorasick": {
|
|
15
|
+
"enabled": true,
|
|
16
|
+
"prebuild": true
|
|
17
|
+
},
|
|
18
|
+
"contextAnalysis": {
|
|
19
|
+
"enabled": true,
|
|
20
|
+
"contextWindow": 50,
|
|
21
|
+
"languages": ["en"],
|
|
22
|
+
"scoreThreshold": 0.5
|
|
23
|
+
},
|
|
24
|
+
"profanityDetection": {
|
|
25
|
+
"enableLeetSpeak": true,
|
|
26
|
+
"caseSensitive": false,
|
|
27
|
+
"strictMode": false,
|
|
28
|
+
"detectPartialWords": false,
|
|
29
|
+
"defaultPlaceholder": "*"
|
|
30
|
+
},
|
|
31
|
+
"performance": {
|
|
32
|
+
"cacheSize": 1000,
|
|
33
|
+
"enableCaching": true
|
|
34
|
+
}
|
|
35
|
+
}
|
package/bin/init.js
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
|
|
3
|
+
import { readFileSync, writeFileSync, existsSync } from 'fs';
|
|
4
|
+
import { fileURLToPath } from 'url';
|
|
5
|
+
import { dirname, join } from 'path';
|
|
6
|
+
|
|
7
|
+
const __filename = fileURLToPath(import.meta.url);
|
|
8
|
+
const __dirname = dirname(__filename);
|
|
9
|
+
|
|
10
|
+
const configFileName = 'allprofanity.config.json';
|
|
11
|
+
const schemaFileName = 'config.schema.json';
|
|
12
|
+
|
|
13
|
+
// Check if config already exists
|
|
14
|
+
if (existsSync(configFileName)) {
|
|
15
|
+
console.log(`â ${configFileName} already exists in current directory`);
|
|
16
|
+
console.log(' Delete it first or use a different name');
|
|
17
|
+
process.exit(1);
|
|
18
|
+
}
|
|
19
|
+
|
|
20
|
+
try {
|
|
21
|
+
// Copy example config to current directory
|
|
22
|
+
const examplePath = join(__dirname, '..', 'allprofanity.config.example.json');
|
|
23
|
+
const schemaPath = join(__dirname, '..', 'config.schema.json');
|
|
24
|
+
|
|
25
|
+
const configContent = readFileSync(examplePath, 'utf-8');
|
|
26
|
+
const schemaContent = readFileSync(schemaPath, 'utf-8');
|
|
27
|
+
|
|
28
|
+
writeFileSync(configFileName, configContent);
|
|
29
|
+
writeFileSync(schemaFileName, schemaContent);
|
|
30
|
+
|
|
31
|
+
console.log('â
AllProfanity configuration files created!');
|
|
32
|
+
console.log('');
|
|
33
|
+
console.log('Created files:');
|
|
34
|
+
console.log(` đ ${configFileName} - Main configuration`);
|
|
35
|
+
console.log(` đ ${schemaFileName} - JSON schema for IDE autocomplete`);
|
|
36
|
+
console.log('');
|
|
37
|
+
console.log('Next steps:');
|
|
38
|
+
console.log(' 1. Edit allprofanity.config.json to customize settings');
|
|
39
|
+
console.log(' 2. Import and use:');
|
|
40
|
+
console.log('');
|
|
41
|
+
console.log(' import { AllProfanity } from "allprofanity";');
|
|
42
|
+
console.log(' import config from "./allprofanity.config.json";');
|
|
43
|
+
console.log(' const filter = AllProfanity.fromConfig(config);');
|
|
44
|
+
console.log('');
|
|
45
|
+
|
|
46
|
+
} catch (error) {
|
|
47
|
+
console.error('â Error creating config files:', error.message);
|
|
48
|
+
process.exit(1);
|
|
49
|
+
}
|