openredaction 1.0.9 β†’ 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # OpenRedaction
2
2
 
3
- Production-ready PII detection and redaction library with 571+ built-in patterns, multiple redaction modes, compliance presets, enterprise SaaS features, and zero dependencies.
3
+ Production-ready PII detection and redaction library with 571+ built-in patterns, multiple redaction modes, compliance presets, and optional enterprise-style modules. The published package lists **no required runtime dependencies**; optional peers (e.g. React, PDF) apply only when you use those integrations.
4
4
 
5
5
  ## Installation
6
6
 
@@ -36,59 +36,12 @@ import { useOpenRedaction, usePIIDetector } from 'openredaction/react';
36
36
 
37
37
  `react` is an optional peer dependency; only install it if you use the React entry.
38
38
 
39
- ## Optional AI Assist
39
+ ## Node HTTP API & Prometheus (optional)
40
40
 
41
- OpenRedaction supports an optional AI-assisted detection mode that enhances regex-based detection by calling a hosted AI endpoint. This feature is **OFF by default** and requires explicit configuration.
42
-
43
- ### Configuration
44
-
45
- ```typescript
46
- import { OpenRedaction } from 'openredaction';
47
-
48
- const detector = new OpenRedaction({
49
- // ... other options ...
50
- ai: {
51
- enabled: true,
52
- endpoint: 'https://your-api.example.com' // Optional: defaults to OPENREDACTION_AI_ENDPOINT env var
53
- }
54
- });
55
-
56
- const result = await detector.detect('Contact John Doe at john@example.com');
57
- ```
58
-
59
- ### How It Works
60
-
61
- 1. **Regex Detection First**: The library always runs regex detection first (existing behavior)
62
- 2. **AI Enhancement**: If `ai.enabled === true` and an endpoint is configured, the library calls the `/ai-detect` endpoint
63
- 3. **Smart Merging**: AI entities are merged with regex detections, with regex taking precedence on conflicts
64
- 4. **Graceful Fallback**: If the AI endpoint fails or is unavailable, the library silently falls back to regex-only detection
65
-
66
- ### Environment Variables
67
-
68
- In Node.js environments, you can set the endpoint via environment variable:
69
-
70
- ```bash
71
- export OPENREDACTION_AI_ENDPOINT=https://your-api.example.com
72
- ```
73
-
74
- ### Important Notes
75
-
76
- - **AI is optional**: The library works exactly as before when `ai.enabled` is `false` or omitted
77
- - **Regex is primary**: AI only adds additional entities; regex detections always take precedence
78
- - **No breaking changes**: When AI is disabled, detection is still regex-only; `detect()` always returns a `Promise`
79
- - **Browser support**: In browsers, you must provide an explicit `ai.endpoint` (env vars not available)
80
- - **Network dependency**: AI mode requires network access to the endpoint
81
-
82
- ### For Sensitive Workloads
83
-
84
- For maximum security and privacy, keep AI disabled and rely purely on regex detection:
41
+ `APIServer`, `createAPIServer`, `PrometheusServer`, and `createPrometheusServer` use Node’s built-in `http` module. They are **not** re-exported from the main entry (`openredaction`) so the default bundle stays free of `node:http` for clearer static analysis.
85
42
 
86
43
  ```typescript
87
- const detector = new OpenRedaction({
88
- // AI not configured = pure regex detection
89
- includeNames: true,
90
- includeEmails: true
91
- });
44
+ import { APIServer, createPrometheusServer } from 'openredaction/server';
92
45
  ```
93
46
 
94
47
  ## Documentation
@@ -101,7 +54,7 @@ const detector = new OpenRedaction({
101
54
  - πŸš€ **Fast & Accurate** - 10-20ms for 2-3KB text
102
55
  - 🎯 **571+ PII Patterns** - Comprehensive coverage across multiple categories
103
56
  - πŸ” **Enterprise SaaS Ready** - Multi-tenancy, persistent audit logging, webhooks, REST API
104
- - πŸ“Š **Production Monitoring** - Prometheus metrics, Grafana dashboards, health checks
57
+ - πŸ“Š **Production Monitoring** - In-memory metrics collector; optional Prometheus HTTP server via `openredaction/server`
105
58
  - 🧠 **Semantic Detection** - Hybrid NER + regex with 40+ contextual rules
106
59
  - 🎨 **Multiple Redaction Modes** - Placeholder, mask-middle, mask-all, format-preserving, token-replace
107
60
  - βœ… **Built-in Validators** - Luhn, IBAN, NHS, National ID checksums
@@ -109,9 +62,9 @@ const detector = new OpenRedaction({
109
62
  - 🎭 **Deterministic Placeholders** - Consistent redaction for same values
110
63
  - 🌍 **Global Coverage** - 50+ countries
111
64
  - πŸ“„ **Structured Data Support** - JSON, CSV, XLSX with path/cell tracking
112
- - 🌳 **Zero Dependencies** - No external packages required (core)
65
+ - 🌳 **No required runtime deps** - Core redaction does not pull mandatory npm packages
113
66
  - πŸ“ **TypeScript Native** - Full type safety and IntelliSense
114
- - πŸ§ͺ **Battle Tested** - 276+ passing tests
67
+ - πŸ§ͺ **Battle Tested** - Large automated test suite
115
68
 
116
69
  ## Pattern Categories
117
70
 
@@ -137,9 +90,9 @@ Retail, Legal, Real Estate, Logistics, Insurance, Healthcare, Emergency Response
137
90
 
138
91
  - **Persistent Audit Logging** - SQLite/PostgreSQL with cryptographic hashing
139
92
  - **Multi-Tenancy** - Tenant isolation, quotas, usage tracking
140
- - **Prometheus Metrics** - HTTP server with Grafana dashboards
93
+ - **Prometheus Metrics** - Optional scrape endpoint (`openredaction/server`)
141
94
  - **Webhook System** - Event-driven alerts with retry logic
142
- - **REST API** - Production-ready HTTP API with authentication
95
+ - **REST API** - Optional HTTP API (`openredaction/server`)
143
96
 
144
97
  ## License
145
98
 
@@ -11081,6 +11081,38 @@ const transportLogisticsPreset = {
11081
11081
  ]
11082
11082
  };
11083
11083
  /**
11084
+ * PCI-DSS oriented preset β€” cardholder data and common payment identifiers
11085
+ */
11086
+ const pciDssPreset = {
11087
+ includeNames: true,
11088
+ includeEmails: true,
11089
+ includePhones: true,
11090
+ includeAddresses: true,
11091
+ categories: [
11092
+ "personal",
11093
+ "contact",
11094
+ "financial",
11095
+ "network"
11096
+ ]
11097
+ };
11098
+ /**
11099
+ * SOC 2 oriented preset β€” broad PII and credentials for trust services contexts
11100
+ */
11101
+ const soc2Preset = {
11102
+ includeNames: true,
11103
+ includeEmails: true,
11104
+ includePhones: true,
11105
+ includeAddresses: true,
11106
+ categories: [
11107
+ "personal",
11108
+ "contact",
11109
+ "financial",
11110
+ "government",
11111
+ "network",
11112
+ "digital-identity"
11113
+ ]
11114
+ };
11115
+ /**
11084
11116
  * Get preset configuration by name
11085
11117
  */
11086
11118
  function getPreset(name) {
@@ -11097,6 +11129,10 @@ function getPreset(name) {
11097
11129
  case "transport-logistics":
11098
11130
  case "transportation":
11099
11131
  case "logistics": return transportLogisticsPreset;
11132
+ case "pci-dss":
11133
+ case "pci_dss": return pciDssPreset;
11134
+ case "soc2":
11135
+ case "soc-2": return soc2Preset;
11100
11136
  default: return {};
11101
11137
  }
11102
11138
  }
@@ -11615,7 +11651,7 @@ var ConfigLoader = class {
11615
11651
  static createDefaultConfig(outputPath = ".openredaction.config.js") {
11616
11652
  fs.writeFileSync(outputPath, `/**
11617
11653
  * OpenRedaction Configuration
11618
- * @see https://github.com/openredact/openredact
11654
+ * @see https://github.com/sam247/openredaction
11619
11655
  */
11620
11656
  export default {
11621
11657
  // Extend built-in presets
@@ -14282,133 +14318,6 @@ function validatePattern(pattern) {
14282
14318
  }
14283
14319
  }
14284
14320
 
14285
- //#endregion
14286
- //#region src/utils/ai-assist.ts
14287
- /**
14288
- * Get the AI endpoint URL from options or environment
14289
- */
14290
- function getAIEndpoint(aiOptions) {
14291
- if (!aiOptions?.enabled) return null;
14292
- if (aiOptions.endpoint) return aiOptions.endpoint;
14293
- if (typeof process !== "undefined" && process.env) {
14294
- const envEndpoint = process.env.OPENREDACTION_AI_ENDPOINT;
14295
- if (envEndpoint) return envEndpoint;
14296
- }
14297
- return null;
14298
- }
14299
- /**
14300
- * Check if fetch is available in the current environment
14301
- */
14302
- function isFetchAvailable() {
14303
- return typeof fetch !== "undefined";
14304
- }
14305
- /**
14306
- * Call the AI endpoint to get additional PII entities
14307
- * Returns null if AI is disabled, endpoint unavailable, or on error
14308
- */
14309
- async function callAIDetect(text, endpoint, debug) {
14310
- if (!isFetchAvailable()) {
14311
- if (debug) console.warn("[OpenRedaction] AI assist requires fetch API. Not available in this environment.");
14312
- return null;
14313
- }
14314
- try {
14315
- const url = endpoint.endsWith("/ai-detect") ? endpoint : `${endpoint}/ai-detect`;
14316
- if (debug) console.log(`[OpenRedaction] Calling AI endpoint: ${url}`);
14317
- const response = await fetch(url, {
14318
- method: "POST",
14319
- headers: { "Content-Type": "application/json" },
14320
- body: JSON.stringify({ text })
14321
- });
14322
- if (!response.ok) {
14323
- if (debug) {
14324
- const statusText = response.status === 429 ? "Rate limit exceeded (429)" : `${response.status}: ${response.statusText}`;
14325
- console.warn(`[OpenRedaction] AI endpoint returned ${statusText}`);
14326
- }
14327
- return null;
14328
- }
14329
- const data = await response.json();
14330
- if (!data.entities || !Array.isArray(data.entities)) {
14331
- if (debug) console.warn("[OpenRedaction] Invalid AI response format: missing entities array");
14332
- return null;
14333
- }
14334
- return data.entities;
14335
- } catch (error) {
14336
- if (debug) console.warn(`[OpenRedaction] AI endpoint error: ${error instanceof Error ? error.message : "Unknown error"}`);
14337
- return null;
14338
- }
14339
- }
14340
- /**
14341
- * Validate an AI entity
14342
- */
14343
- function validateAIEntity(entity, textLength) {
14344
- if (!entity.type || !entity.value || typeof entity.start !== "number" || typeof entity.end !== "number") return false;
14345
- if (entity.start < 0 || entity.end < 0 || entity.start >= entity.end) return false;
14346
- if (entity.start >= textLength || entity.end > textLength) return false;
14347
- if (entity.value.length !== entity.end - entity.start) return false;
14348
- return true;
14349
- }
14350
- /**
14351
- * Check if two detections overlap significantly
14352
- * Returns true if they overlap by more than 50% of the shorter detection
14353
- */
14354
- function detectionsOverlap(det1, det2) {
14355
- const [start1, end1] = det1.position;
14356
- const [start2, end2] = det2.position;
14357
- const overlapStart = Math.max(start1, start2);
14358
- const overlapEnd = Math.min(end1, end2);
14359
- if (overlapStart >= overlapEnd) return false;
14360
- const overlapLength = overlapEnd - overlapStart;
14361
- const length1 = end1 - start1;
14362
- const length2 = end2 - start2;
14363
- return overlapLength > Math.min(length1, length2) * .5;
14364
- }
14365
- /**
14366
- * Convert AI entity to PIIDetection format
14367
- */
14368
- function convertAIEntityToDetection(entity, text) {
14369
- if (!validateAIEntity(entity, text.length)) return null;
14370
- const actualValue = text.substring(entity.start, entity.end);
14371
- let type = entity.type.toUpperCase();
14372
- if (type.includes("EMAIL") || type === "EMAIL_ADDRESS") type = "EMAIL";
14373
- else if (type.includes("PHONE") || type === "PHONE_NUMBER") type = "PHONE_US";
14374
- else if (type.includes("NAME") || type === "PERSON") type = "NAME";
14375
- else if (type.includes("SSN") || type === "SOCIAL_SECURITY_NUMBER") type = "SSN";
14376
- else if (type.includes("ADDRESS")) type = "ADDRESS_STREET";
14377
- let severity = "medium";
14378
- if (type === "SSN" || type === "CREDIT_CARD") severity = "critical";
14379
- else if (type === "EMAIL" || type === "PHONE_US" || type === "NAME") severity = "high";
14380
- return {
14381
- type,
14382
- value: actualValue,
14383
- placeholder: `[${type}_${Math.random().toString(36).substring(2, 9)}]`,
14384
- position: [entity.start, entity.end],
14385
- severity,
14386
- confidence: entity.confidence ?? .7
14387
- };
14388
- }
14389
- /**
14390
- * Merge AI entities with regex detections
14391
- * Prefers regex detections on conflicts
14392
- */
14393
- function mergeAIEntities(regexDetections, aiEntities, text) {
14394
- const merged = [...regexDetections];
14395
- const processedRanges = regexDetections.map((d) => d.position);
14396
- for (const aiEntity of aiEntities) {
14397
- const detection = convertAIEntityToDetection(aiEntity, text);
14398
- if (!detection) continue;
14399
- let hasOverlap = false;
14400
- for (const regexDet of regexDetections) if (detectionsOverlap(regexDet, detection)) {
14401
- hasOverlap = true;
14402
- break;
14403
- }
14404
- if (!hasOverlap) {
14405
- merged.push(detection);
14406
- processedRanges.push(detection.position);
14407
- }
14408
- }
14409
- return merged;
14410
- }
14411
-
14412
14321
  //#endregion
14413
14322
  //#region src/config/ConfigExporter.ts
14414
14323
  var ConfigExporter_exports = /* @__PURE__ */ __exportAll({
@@ -16553,7 +16462,7 @@ var OpenRedaction = class OpenRedaction {
16553
16462
  redactionMode: "placeholder",
16554
16463
  enableContextAnalysis: true,
16555
16464
  confidenceThreshold: .5,
16556
- enableFalsePositiveFilter: false,
16465
+ enableFalsePositiveFilter: true,
16557
16466
  falsePositiveThreshold: .7,
16558
16467
  enableMultiPass: false,
16559
16468
  multiPassCount: 3,
@@ -16782,8 +16691,9 @@ var OpenRedaction = class OpenRedaction {
16782
16691
  throw error;
16783
16692
  }
16784
16693
  }
16785
- if (this.nerDetector && detections.length > 0) {
16786
- const piiMatches = detections.map((det) => ({
16694
+ if (this.nerDetector && this.nerDetector.isAvailable()) {
16695
+ const nerMatches = this.nerDetector.detect(text);
16696
+ let piiMatches = detections.map((det) => ({
16787
16697
  type: det.type,
16788
16698
  value: det.value,
16789
16699
  start: det.position[0],
@@ -16794,11 +16704,43 @@ var OpenRedaction = class OpenRedaction {
16794
16704
  after: text.substring(det.position[1], Math.min(text.length, det.position[1] + 50))
16795
16705
  }
16796
16706
  }));
16797
- const hybridMatches = this.nerDetector.hybridDetection(piiMatches, text);
16798
- detections = detections.map((det, index) => ({
16799
- ...det,
16800
- confidence: hybridMatches[index].confidence
16801
- }));
16707
+ if (detections.length > 0) {
16708
+ const hybridMatches = this.nerDetector.hybridDetection(piiMatches, text);
16709
+ detections = detections.map((det, index) => ({
16710
+ ...det,
16711
+ confidence: hybridMatches[index].confidence
16712
+ }));
16713
+ piiMatches = detections.map((det) => ({
16714
+ type: det.type,
16715
+ value: det.value,
16716
+ start: det.position[0],
16717
+ end: det.position[1],
16718
+ confidence: det.confidence || 1,
16719
+ context: {
16720
+ before: text.substring(Math.max(0, det.position[0] - 50), det.position[0]),
16721
+ after: text.substring(det.position[1], Math.min(text.length, det.position[1] + 50))
16722
+ }
16723
+ }));
16724
+ }
16725
+ const nerOnly = this.nerDetector.extractNEROnly(nerMatches, piiMatches);
16726
+ for (const ner of nerOnly) {
16727
+ const syntheticPattern = {
16728
+ type: `NER_${ner.type}`,
16729
+ regex: /.^/,
16730
+ priority: 1,
16731
+ placeholder: `[NER_${ner.type}_{n}]`,
16732
+ severity: "medium"
16733
+ };
16734
+ const placeholder = this.generatePlaceholder(ner.text, syntheticPattern);
16735
+ detections.push({
16736
+ type: syntheticPattern.type,
16737
+ value: ner.text,
16738
+ placeholder,
16739
+ position: [ner.start, ner.end],
16740
+ severity: "medium",
16741
+ confidence: ner.confidence
16742
+ });
16743
+ }
16802
16744
  }
16803
16745
  if (this.contextRulesEngine && detections.length > 0) {
16804
16746
  const piiMatches = detections.map((det) => ({
@@ -16822,7 +16764,7 @@ var OpenRedaction = class OpenRedaction {
16822
16764
  }
16823
16765
  /**
16824
16766
  * Detect PII in text
16825
- * Now async to support optional AI assist
16767
+ * Async API for detection pipeline (NER, multi-pass, etc.)
16826
16768
  */
16827
16769
  async detect(text) {
16828
16770
  if (this.rbacManager && !this.rbacManager.hasPermission("detection:detect")) throw new Error("[OpenRedaction] Permission denied: detection:detect required");
@@ -16862,21 +16804,6 @@ var OpenRedaction = class OpenRedaction {
16862
16804
  }
16863
16805
  detections = mergePassDetections(passDetections, this.multiPassConfig);
16864
16806
  } else detections = this.processPatterns(text, this.patterns, processedRanges);
16865
- if (this.options.ai?.enabled) {
16866
- const aiEndpoint = getAIEndpoint(this.options.ai);
16867
- if (aiEndpoint) try {
16868
- if (this.options.debug) console.log("[OpenRedaction] AI assist enabled, calling AI endpoint...");
16869
- const aiEntities = await callAIDetect(text, aiEndpoint, this.options.debug);
16870
- if (aiEntities && aiEntities.length > 0) {
16871
- if (this.options.debug) console.log(`[OpenRedaction] AI returned ${aiEntities.length} additional entities`);
16872
- detections = mergeAIEntities(detections, aiEntities, text);
16873
- if (this.options.debug) console.log(`[OpenRedaction] After AI merge: ${detections.length} total detections`);
16874
- } else if (this.options.debug) console.log("[OpenRedaction] AI endpoint returned no additional entities");
16875
- } catch (error) {
16876
- if (this.options.debug) console.warn(`[OpenRedaction] AI assist failed, using regex-only: ${error instanceof Error ? error.message : "Unknown error"}`);
16877
- }
16878
- else if (this.options.debug) console.warn("[OpenRedaction] AI assist enabled but no endpoint configured. Set ai.endpoint or OPENREDACTION_AI_ENDPOINT env var.");
16879
- }
16880
16807
  detections.sort((a, b) => b.position[0] - a.position[0]);
16881
16808
  let redacted = text;
16882
16809
  const redactionMap = {};