@redsocs/spam-warden 1.2.3 → 1.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,151 +6,206 @@ Lightweight, universal JavaScript library for real-time spam detection and autom
6
6
  [![npm](https://img.shields.io/npm/v/%40redsocs%2Fspam-warden.svg)](https://www.npmjs.com/package/@redsocs/spam-warden)
7
7
  [![Sponsor](https://img.shields.io/badge/Sponsor-Buy%20Me%20a%20Coffee-ffdd00?style=flat&logo=buy-me-a-coffee&logoColor=black)](https://buymeacoffee.com/redsocs?new=1)
8
8
 
9
- # What is this?
9
+ ---
10
+
11
+ ## What is this?
12
+
13
+ **SpamWarden.js** is a zero-dependency, universal engine that detects spam directly at the source. It is engineered specifically to combat the regional surge in gambling, loan, and "fast money" spam campaigns targeting public sector and enterprise platforms.
14
+
15
+ By running natively in the browser, it intercepts malicious payloads before they ever reach your database, saving server resources, maintaining data integrity, and feeding sanitized threat intelligence directly into your SIEM.
16
+
17
+ 👉 **[Live Demo, Scanner & Script Generator](https://spam-warden-js-527b79.gitlab.io/)**
18
+
19
+ ---
20
+
21
+ ## 🛑 A Note on Honesty: Our War on Spammers (Old vs. New)
22
+
23
+ Let’s be completely transparent about how this library has evolved.
24
+
25
+ In our earlier versions, our absolute obsession with punishing casino and loan botnets led us to build something genuinely brutal. We deployed aggressive DOM scraping, intentional memory leaks, infinite `history.replaceState` loops, and hidden `debugger` traps. The goal was simple: crash their headless browsers and completely destroy their automated assets.
26
+
27
+ **The problem?** If you build a script that behaves exactly like hostile malware, enterprise Static Application Security Testing (SAST) tools will flag it as hostile malware. Government and corporate scanners took one look at our aggressive heuristic scraping and thrown critical alerts for "stealth loader behavior" and "anti-analysis malware."
28
+
29
+ **The Solution:** We had to grow up and build an enterprise-compliant architecture. We stripped out the browser-crashing loops, the wild DOM scraping, and the sketchy `atob` obfuscation.
30
+
31
+ But we did not surrender. Instead of crashing their machines, we now target their operational costs. The new architecture replaces the brutal malware traps with a **Cryptographic Proof of Work (PoW) Tarpit**. It sails right past SAST compliance audits, but if a headless bot tries to bypass the UI and execute our decoy endpoints, it forces their CPU into a mathematical chokehold, burning their compute credits for every attack they attempt.
32
+
33
+ ---
34
+
35
+ ## What's Inside: The Hybrid Detection Engine
36
+
37
+ SpamWarden doesn't just rely on one method. It processes input through a strict three-phase pipeline designed to be lightning-fast and mathematically invisible to security scanners (no `eval()`, no DOM injection).
10
38
 
11
- **SpamWarden.js** is a zero-dependency, universal engine that detects spam directly at the source. It uses a **Present-Only Naive Bayes** model (derived from Bernoulli Naive Bayes) trained specifically on Thai spam patterns (gambling, loans, "fast money" scams) and optimized with a dynamic, length-calibrated decision threshold to eliminate false positives on longer, clean text.
39
+ ### Phase 1: The "Lightcheck" (Zero-Math Blocking)
12
40
 
13
- By running natively, it allows you to **block spam before it ever hits your database**, saving server resources and keeping your data clean.
41
+ Before waking up the heavy machine learning model, the engine does a microsecond sweep for hardcoded malicious intent.
14
42
 
15
- ![SIEM Endpoint & Spam Block Demo](https://cdn.redsocs.com/assets/siem-endpoint-spamblock-demo.gif)
43
+ - Instantly blocks isolated currency symbols (`$`, `€`, `£`, `฿`) often used in fast-money scams.
44
+ - Instantly blocks known spam link shorteners and redirectors (`line[dot]me`, `bit[dot]ly`).
45
+ - _Result: Zero CPU wasted on obvious bot blasts._
16
46
 
17
- # Live Demo & Scanner
47
+ ### Phase 2: The Thai-Optimized Tokenizer
18
48
 
19
- You can test the spam engine interactively, analyze your forms, and generate auto-blocking script configurations directly on our GitLab Pages site:
49
+ Standard Western spam filters break text by spaces, which completely fails for the Thai language.
20
50
 
21
- 👉 **[Live Demo & Generator](https://spam-warden-js-527b79.gitlab.io/)**
51
+ - The engine sweeps through the input, stripping whitespace and generating **trigrams** (3-letter groups) and **quadgrams** (4-letter groups) for the entire string.
52
+ - _Result: Mathematically forces space-less Thai words to reveal their hidden spam clusters._
22
53
 
23
- # Quickstart
54
+ ### Phase 3: Present-Only Naive Bayes (The Core)
55
+
56
+ A modified Naive Bayes classifier trained exclusively on real-world spam samples.
57
+
58
+ - **CPU Friendly:** It utilizes a `Set` to track _only_ the vocabulary features actually present in the user's text, calculating logarithmic probability exclusively for those matches rather than iterating over the entire 28,000+ word dictionary.
59
+ - **Dynamic Thresholding:** To prevent false positives on genuinely long, detailed user comments, it applies a length-dependent threshold penalty: $5.5+0.49\times{N}$ (where N is the number of matched features). The longer the text, the harder the engine adjusts to remain fair.
60
+
61
+ ---
62
+
63
+ ## Security & Active Defense (SAST Compliant)
64
+
65
+ SpamWarden utilizes a **Hostile Active Defense** architecture. It is built to pass enterprise audits while remaining an absolute nightmare for automated botnets.
66
+
67
+ 1. **The Phantom Core (Closure Isolation):** The real detection engine does _not_ exist on the global `window` object. It is sealed entirely inside an anonymous execution closure. It is technically impossible for an attacker's script to query, overwrite, or disable the core function via the browser console.
68
+ 2. **Polymorphic Ghost Tarpits (The Honeypot):** At execution, the script dynamically generates polymorphic decoy engines hidden behind believable frontend variable names (e.g., `window.aBcDeCache`). If a bot bypasses the UI and blindly executes these fakes, they are trapped in the bounded PoW `djb2` hash loop.
69
+ 3. **Brutal DOM Protection:** By utilizing Document-Level Capturing Phase listeners and Prototype Monkey-Patching, SpamWarden intercepts malicious submissions _before_ they reach the form element, defeating direct `document.forms[0].submit()` bypasses.
70
+ 4. **Anti-Tamper Lockout:** If a script attempts to strip the `data-sw-protect` targeting attributes off your HTML, a hidden internal `MutationObserver` instantly detects the tampering and permanently disables the form.
71
+
72
+ ---
73
+
74
+ ## Quickstart
24
75
 
25
76
  > [!IMPORTANT]
26
- > **Are you a Thai government agency or public sector website administrator?**
27
- > Get your free token configuration and drop-in script to protect your online portals from annoying gambling/loan ads and spam campaigns at [redsocs.com/spam-warden](https://redsocs.com/spam-warden).
77
+ > **For Public Sector & Government Admins:** Get your free token configuration and drop-in script to protect your online portals at [redsocs.com/spam-warden](https://redsocs.com/spam-warden).
78
+
79
+ ---
28
80
 
29
81
  ### 1. Zero-Config Local Protection (No Telemetry)
30
82
 
31
- Add this script to your page with the `data-auto-protect` attribute. It will automatically find your most significant forms (using an intelligent heuristic: top 2 forms with >= 2 inputs) and block submission if spam is detected.
83
+ Explicit opt-in protection. No data leaves the browser. Simply include the script and tag your inputs.
32
84
 
33
- By default, this mode also enables PII masking (DLP). To disable PII masking, add `data-sd="0"`.
85
+ **Which file should you choose? (Pick ONLY ONE):**
86
+
87
+ - `spamwarden.min.js`: The standard minified version. Best for general performance and faster browser parse times.
88
+ - `spamwarden.min.ob.js`: The obfuscated version. It applies control-flow flattening and string encoding to aggressively penalize reverse-engineering attempts by malicious actors, at the cost of a slightly larger file size.
34
89
 
35
90
  ```html
36
- <script
37
- src="https://cdn.redsocs.com/js/spamwarden.min.js"
38
- data-auto-protect
39
- ></script>
91
+ <!-- ⚠️ IMPORTANT: Choose ONLY ONE of the scripts below. Do not include both! -->
92
+
93
+ <!-- Option A: Standard Minified (Best Performance) -->
94
+ <script src="https://cdn.redsocs.com/js/spamwarden.min.js"></script>
95
+
96
+ <!-- Option B: Obfuscated (Maximum Security) -->
97
+ <!-- <script src="https://cdn.redsocs.com/js/spamwarden.min.ob.js"></script> -->
98
+
99
+ <form>
100
+ <!-- Just add data-sw-protect="true" to any field -->
101
+ <textarea name="comment" data-sw-protect="true"></textarea>
102
+ <button type="submit">Submit</button>
103
+ </form>
40
104
  ```
41
105
 
42
- ### 2. Enterprise Telemetry (SIEM Integration)
106
+ ---
107
+
108
+ ### 2. Enterprise Telemetry & DLP (SIEM Integration)
43
109
 
44
- If you need to report blocked spam payloads to a central SIEM/SOC, provide a Base64 configuration string via the `endpoint` parameter.
110
+ Report blocked payloads to a central SOC, SIEM, or custom logging server. Use the `siems` attribute to define your receiving endpoint(s). You can provide a single URL or a comma-separated list of multiple URLs to broadcast the telemetry to several destinations simultaneously.
111
+
112
+ Add `data-sd="1"` to enable built-in Data Loss Prevention (DLP), which automatically masks Credit Cards, Phone Numbers, and Emails (`[CARD_MASKED]`) before network transmission.
113
+
114
+ **Single Endpoint:**
45
115
 
46
116
  ```html
47
- <script src="https://cdn.redsocs.com/js/spamwarden.min.js?endpoint=MHxzaWVtLnJlZHNvY3MuY29tL3Yx"></script>
117
+ <script
118
+ src="https://cdn.redsocs.com/js/spamwarden.min.ob.js"
119
+ siems="siem.redsocs.com/v1"
120
+ data-sd="1"
121
+ ></script>
48
122
  ```
49
123
 
50
- _Note: The `endpoint` parameter is a Base64 encoded string of `sdFlag|siemEndpoint` (e.g., `0|siem.redsocs.com/v1`)._
124
+ **Multiple Endpoints (Comma-separated):**
125
+
126
+ ```html
127
+ <script
128
+ src="https://cdn.redsocs.com/js/spamwarden.min.ob.js"
129
+ siems="api-spam.siem.go.th/v1?token=[token],siem-logger.yourdomain.com/logs"
130
+ data-sd="1"
131
+ ></script>
132
+ ```
51
133
 
52
134
  ### 3. API Usage (Node Only)
53
135
 
54
136
  ```javascript
55
- const result = spamwarden.spamcheck(
56
- "[Hello, this is a Thai casino & scam ads — and guess what? Your tax pays for my traffic.]",
137
+ const sw = require("@redsocs/spam-warden");
138
+
139
+ const result = sw.spamcheck(
140
+ "[Hello, this is a Thai casino & scam ads — guess what? Your tax pays for my traffic.]",
57
141
  );
142
+
58
143
  if (result.isSpam) {
59
144
  console.log("Blocked:", result.reason || "AI match");
60
145
  console.log("Confidence:", result.prob);
61
146
  }
62
147
  ```
63
148
 
64
- # Scope
65
-
66
- SpamWarden is designed for **interactive web elements**:
67
-
68
- - **Contact Forms:** Prevent bot and manual spam submissions.
69
- - **Comment Sections:** Real-time feedback for users before they post.
70
- - **Chat Inputs:** Instant filtering of malicious links and currency-heavy spam.
71
- - **Privacy-First Apps:** Since detection happens locally, user data doesn't leave the browser unless explicitly reported.
72
-
73
- # What's inside?
74
-
75
- - **Hybrid Detection Engine:**
76
- - **Hard Rules:** Instant blocking for currency symbols (`$€£฿`) and known spam link patterns (`line[dot]me`, `bit[dot]ly`).
77
- - **Thai-Optimized Tokenizer:** Extracts whitespace tokens, **trigrams**, and **quadgrams** to handle the space-less nature of the Thai language.
78
- - **Present-Only NB Classifier:** A modified Naive Bayes model trained on real-world spam samples. It only evaluates present vocabulary features and utilizes a length-dependent threshold offset ($5.5 + 0.49 \times N$ matched features) to calibrate confidence and prevent false positives on longer clean texts.
79
- - **Telemetry System:** Optional auto-reporting of spam hits to `api.redsocs.com` for global threat intelligence.
80
- - **Auto-Interceptor:** Event listeners that hook into DOM forms to provide "Drop-in" protection.
81
-
82
- # Why this exists?
149
+ ---
83
150
 
84
- Traditional spam filters (like Akismet or ReCaptcha) often:
151
+ ## Scope & Backend Requirements
85
152
 
86
- 1. Require a round-trip to a server (latency).
87
- 2. Are expensive for high-volume sites.
88
- 3. Over-collect user data (privacy concerns).
89
- 4. Struggle with specific Thai-language spam patterns.
153
+ SpamWarden is designed for **interactive web elements**: Contact Forms, Comment Sections, and Chat Inputs.
90
154
 
91
- **SpamWarden** exists to provide a **local, fast, and Thai-centric** alternative that stops spam at the source: the user's input field.
155
+ > [!WARNING]
156
+ > **Client-Side Limits:** All client-side code is inherently bypassable by a sufficiently motivated, manual human attacker. If you require absolute security, you **must** validate payloads on your backend.
157
+ >
158
+ > - **For WordPress:** Use our [SpamWarden WP Plugin](https://redsocs.com/spam-warden) to protect your server at the PHP layer.
159
+ > - **For Custom Stacks (Node):** Grab this NPM package directly, bundle it internally, and run the `spamcheck()` function on your backend server before hitting your database.
92
160
 
93
- # Security & Active Defense
161
+ ---
94
162
 
95
- > [!WARNING]
96
- > **Honesty First:** All client-side code is inherently bypassable by a sufficiently motivated human. However, we have engineered this library to be an absolute nightmare for automated bots and script kiddies.
163
+ ## Local Simulation & Testing
97
164
 
98
- We do not rely solely on "Security through Obscurity." SpamWarden employs a **Hostile Active Defense** architecture:
165
+ Spin up a local simulation server to test the DOM auto-blocking behavior and inspect SIEM telemetry payloads in real time:
99
166
 
100
- 1. **The Ghost Tarpit (Honeypot):** We intentionally deploy a "Poison Pill" decoy. If a bot or attacker attempts to bypass or tamper with the script, they are redirected into this trap, which is designed to actively retaliate by crashing headless browsers (Puppeteer/Playwright) and wasting attacker compute credits.
101
- 2. **Build-Time Randomization (The Moving Target):** The real machine-learning engine is hidden inside an isolated closure and bound to the DOM using a randomized cryptographic key generated during compilation. The internal execution path changes on every release, defeating static bypass scripts.
102
- 3. **Brutal DOM Protection:** By utilizing Document-Level Capturing Phase listeners, Prototype Monkey-Patching, and MutationObservers, SpamWarden intercepts submissions before they reach the form element. This defeats trivial bypasses like form cloning or direct `document.forms[0].submit()` calls.
103
- 4. **Aggressive Obfuscation:** The final distribution is run through proprietary, high-entropy obfuscation routines to protect the model weights and heavily penalize reverse engineering attempts.
167
+ 1. **Start the server**: `npm run test-server`
168
+ 2. **Open the test page** in your browser:
169
+ [http://localhost:3000/](https://www.google.com/search?q=http://localhost:3000/)
170
+ 3. **Submit a spam message** (e.g., including currency signs like `฿` or links like `line[dot]me`).
171
+ 4. **Observe the result**:
104
172
 
105
- If you require absolute, mathematically unbroken security, client-side protection will never be enough. You **must** validate payloads on your backend:
173
+ - The form submission will be blocked on the page.
174
+ - The terminal will display the defanged and sanitized telemetry payload sent to the SIEM receiver:
175
+
176
+ ```text
177
+ 🚨 [SIEM RECEIVER] Blocked Payload Received!
178
+ ================================================
179
+ Endpoint: siem.gov-sec.go.th/v1?token=eGuec...
180
+ URL: h_tt_p://victim.go.th:3000/
181
+ Rule Matched: currency_symbol
182
+ Confidence: 100%
183
+ PII Masked? true
184
+ Pasted? false
185
+ Actors: [[at]TUNA_FISH]
186
+ Sanitized: "Win [CARD_MASKED] now! [at]TUNA_FISH"
187
+ ================================================
188
+ ```
106
189
 
107
- - **For WordPress:** Use our [SpamWarden WP Plugin](https://redsocs.com/spam-warden) to protect your server at the PHP layer (Paid).
108
- - **For Node.js/Custom Stacks:** Grab this NPM package directly, bundle it internally, and run the `spamcheck()` function on your backend server before hitting your database (Free).
190
+ **_And if it no config or attribute script at `endpoint` this tool send nothing to the outside._**
109
191
 
110
- # Local Simulation & Testing
192
+ ---
111
193
 
112
- You can spin up a local simulation server to test the DOM auto-blocking behavior and inspect the SIEM telemetry payloads in real time:
194
+ About
113
195
 
114
- 1. **Start the simulation server**:
115
- ```bash
116
- npm run test-server
117
- ```
118
- 2. **Open the test page** in your browser:
119
- [http://localhost:3000/](http://localhost:3000/)
120
- 3. **Submit a spam message** (e.g., including currency signs like `฿` or links like `line[dot]me`).
121
- 4. **Observe the result**:
122
- - The form submission will be blocked on the page.
123
- - The terminal will display the defanged and sanitized telemetry payload sent to the SIEM receiver:
124
- ```text
125
- 🚨 [SIEM RECEIVER] Blocked Payload Received!
126
- ================================================
127
- Endpoint Token: MXxodHRwOi8vbG9jYWxob3N0OjMwMDAvdjEvdGVsZW1ldHJ5
128
- URL: h_tt_p://localhost:3000/
129
- Rule Matched: currency_symbol
130
- Confidence: 100%
131
- PII Masked? false
132
- Pasted? false
133
- Actors: []
134
- Sanitized: "Win [CARD_MASKED] now!"
135
- ================================================
136
- ```
137
-
138
- # About
139
-
140
- - **Version:** 1.1.11 (Engine v11.06)
196
+ - **Version** 1.3.0 (Engine v11.06)
141
197
  - **Author:** [RedSocs](https://github.com/RedSocs)
142
198
  - **License:** MIT
143
- - **Model Origin:** Trained via [RedSocs/spam-labeler](https://github.com/RedSocs/spam-labeler)
144
- - **Inquiries & Enterprise Support:** [pichit[at]redsocs.com](mailto:pichit@redsocs.com)
199
+ - **Inquiries & Enterprise Support:** [pichit[at]redsocs.com](https://www.google.com/search?q=mailto%3Apichit%40redsocs.com)
145
200
  - **Sponsor:** [Buy Me a Coffee](https://buymeacoffee.com/redsocs?new=1)
146
201
 
147
202
  ---
148
203
 
149
204
  ### Technical Specs
150
205
 
151
- | Property | Value |
152
- | ----------------- | ------------------------- |
153
- | **Minified Size** | ~2.0 MB (including model) |
154
- | **Gzipped Size** | **~341 KB** |
155
- | **Dependencies** | 0 (Vanilla JS) |
156
- | **Vocabulary** | 28106 features |
206
+ | Property | Value |
207
+ | ------------------- | --------------------------------- |
208
+ | **Minified Size** | ~2.0 MB (including model weights) |
209
+ | **Gzipped Size** | **~341 KB** |
210
+ | **Dependencies** | 0 (Vanilla JS) |
211
+ | **Vocabulary**1.3.0 | 28,106 features |