mask-privacy 3.3.0 → 3.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -127,27 +127,18 @@ Performance-sensitive deployments utilize the built-in `LocalTransformersScanner
127
127
  ### 7. Sub-string Detokenization
128
128
  Mask includes the ability to detokenize PII embedded within larger text blocks (like email bodies or chat messages). `detokenizeText()` uses high-performance regex to find and restore all tokens within a paragraph before they hit your tools.
129
129
 
130
- ## Multilingual PII Detection (Waterfall Pipeline)
130
+ ## Multilingual PII Detection (2-Tier Waterfall)
131
131
 
132
- Mask is built for the global enterprise. The TypeScript SDK implements a **3-Tier Waterfall Detection** strategy for high-precision PII detection in **English and Spanish** using local ONNX models.
133
-
134
- ### Supported Language Matrix
135
-
136
- Mask provides first-class support for the following languages:
137
-
138
- | Language | Code | Tier 0 (DLP) | Tier 2 (NLP Engine) |
139
- | :--- | :--- | :--- | :--- |
140
- | **English** | `en` | ✅ Full | DistilBERT (Simple) |
141
- | **Spanish** | `es` | ✅ Full | BERT Multilingual |
132
+ Mask is built for the global enterprise. The TypeScript SDK implements a **2-Tier Model-Augmented Waterfall** strategy for high-precision PII detection in **English and Spanish**.
142
133
 
143
134
  ### How the Waterfall Works: The Excising Mechanism
144
135
 
145
- To maintain high performance, the TypeScript SDK does not simply run three separate scans. It uses a **Sequential Mutation** strategy:
136
+ To maintain high performance, the TypeScript SDK does not simply run multiple separate scans. It uses a **Sequential Mutation** strategy:
146
137
 
147
- 1. **Tier 0 & 1 (The Scouts):** The SDK first runs the high-speed DLP and Regex engines synchronously in the main thread.
148
- 2. **Immediate Tokenization:** Any PII found by these tiers is **immediately replaced** by a token in the string buffer.
149
- 3. **Tier 2 (The Heavy Infantry):** The expensive NLP engine (Transformers.js) only scans the *remaining* text. Because the PII has already been "excised" (cut out and replaced with tokens), the NLP engine doesn't waste compute on data already identified.
150
- 4. **Bypass Logic:** All tiers are "token-aware." If a scan encounter a string that is already a Mask token, it skips it entirely, preventing redundant processing or "double-tokenization."
138
+ 1. **Tier 0: Deterministic (The Registry):** The SDK first runs the high-speed DLP and Registry engines. These use regex + checksums (Luhn, Mod-97, Mod-11) + Proximity Keywords to identify structured PII (Bank Accounts, SSNs, DNI, NUSS, etc.) with 100% precision.
139
+ 2. **Immediate Tokenization:** Any PII found by Tier 0 is **immediately replaced** by a token in the string buffer.
140
+ 3. **Tier 1: Probabilistic (Neural NER):** The expensive NLP engine (Transformers.js) only scans the *remaining* text for unstructured entities: **PERSON**, **LOCATION**, and **ORGANIZATION**. Because Tier 0 PII has already been "excised", the NLP engine doesn't waste compute on data already identified, and entity collisions are avoided.
141
+ 4. **Bypass Logic:** All tiers are "token-aware." If a scan encounters a string that is already a Mask token, it skips it entirely.
151
142
 
152
143
  ---
153
144
 
package/dist/index.d.mts CHANGED
@@ -481,7 +481,7 @@ declare class DLPConfidenceScorer {
481
481
  * Provides format-preserving encryption, local/distributed vaulting,
482
482
  * and framework-agnostic tool interception hooks.
483
483
  */
484
- declare const VERSION = "3.3.0";
484
+ declare const VERSION = "3.4.0";
485
485
 
486
486
  /**
487
487
  * Detect PII entities in text and return a list of objects with metadata.
package/dist/index.d.ts CHANGED
@@ -481,7 +481,7 @@ declare class DLPConfidenceScorer {
481
481
  * Provides format-preserving encryption, local/distributed vaulting,
482
482
  * and framework-agnostic tool interception hooks.
483
483
  */
484
- declare const VERSION = "3.3.0";
484
+ declare const VERSION = "3.4.0";
485
485
 
486
486
  /**
487
487
  * Detect PII entities in text and return a list of objects with metadata.
package/dist/index.js CHANGED
@@ -59619,7 +59619,7 @@ init_handlers();
59619
59619
  init_scorer();
59620
59620
 
59621
59621
  // src/index.ts
59622
- var VERSION = "3.3.0";
59622
+ var VERSION = "3.4.0";
59623
59623
  async function detectEntitiesWithConfidence(text, options = {}) {
59624
59624
  const scanner = getScanner();
59625
59625
  return await scanner.scanAndReturnEntities(text, options);