dpdp-erasure-cli 1.0.11 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +40 -85
- package/compliance.worker.yml +164 -0
- package/package.json +2 -2
- package/report.json +370 -0
- package/report.md +57 -0
- package/src/modules/engine/vault/satellite-mutation.ts +4 -4
- package/src/modules/engine/vault/satellite.ts +4 -4
- package/src/modules/introspector/classifier.ts +18 -6
- package/src/modules/introspector/dag.ts +19 -1
- package/src/modules/introspector/run.ts +51 -9
- package/src/modules/introspector/yaml.ts +14 -0
package/README.md
CHANGED
|
@@ -1,31 +1,38 @@
|
|
|
1
|
-
#
|
|
1
|
+
# dpdp-erasure-cli
|
|
2
2
|
|
|
3
3
|
[](https://badge.fury.io/js/dpdp-erasure-cli)
|
|
4
4
|
|
|
5
|
-
The
|
|
5
|
+
**The DPDP Erasure Engine CLI** is an automated, AI-assisted privacy toolkit that helps you securely discover, map, and cryptographically shred PII (Personally Identifiable Information) in your database.
|
|
6
6
|
|
|
7
|
-
It
|
|
7
|
+
It acts as the control plane for the [DPDP Erasure Engine](https://github.com/devxdh/dpdp-erasure-engine), allowing Data Protection Officers (DPOs) and Software Engineers to effortlessly comply with global privacy laws (DPDP, GDPR, CCPA) without writing manual SQL deletion scripts.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## 🎯 What does it do?
|
|
12
|
+
|
|
13
|
+
Manually deleting a user across dozens of microservice tables is dangerous and prone to failure. `dpdp-erasure-cli` solves this by:
|
|
14
|
+
|
|
15
|
+
1. **Introspection & NLP Mapping:** Safely scans your live database (using `TABLESAMPLE` block sampling) to find hidden PII in text columns, JSON blobs, and orphaned tables.
|
|
16
|
+
2. **DAG Compilation:** Maps your entire Foreign Key graph to figure out the exact order tables must be deleted to avoid database constraint violations.
|
|
17
|
+
3. **Drafting a Manifest:** Automatically generates a `compliance.worker.yml` erasure plan that handles HMAC redaction vs. hard deletion.
|
|
18
|
+
4. **Cryptographic Signatures:** Locks the manifest using Ed25519 signatures so production deletion rules cannot be silently altered.
|
|
19
|
+
5. **Dry-Run Simulations:** Tests the erasure locally in a rolled-back PostgreSQL transaction to prove it works before you deploy.
|
|
8
20
|
|
|
9
21
|
---
|
|
10
22
|
|
|
11
23
|
## 🚀 Installation
|
|
12
24
|
|
|
13
|
-
This CLI
|
|
25
|
+
This CLI relies on [Bun](https://bun.sh/) for native cryptographic bindings and high-performance execution.
|
|
14
26
|
|
|
15
27
|
```bash
|
|
16
28
|
npm install -g dpdp-erasure-cli
|
|
17
29
|
```
|
|
18
30
|
|
|
19
|
-
*Alternatively, if running from the monorepo root:*
|
|
20
|
-
```bash
|
|
21
|
-
bun run --cwd ./apps/worker cli <command>
|
|
22
|
-
```
|
|
23
|
-
|
|
24
31
|
---
|
|
25
32
|
|
|
26
|
-
## 🛠️ Interactive
|
|
33
|
+
## 🛠️ Interactive Setup
|
|
27
34
|
|
|
28
|
-
|
|
35
|
+
Don't want to memorize commands? Just run the CLI with no arguments to launch the interactive wizard:
|
|
29
36
|
|
|
30
37
|
```bash
|
|
31
38
|
dpdp-cli
|
|
@@ -33,108 +40,56 @@ dpdp-cli
|
|
|
33
40
|
|
|
34
41
|
---
|
|
35
42
|
|
|
36
|
-
## 📚
|
|
43
|
+
## 📚 Quick Start Guide
|
|
44
|
+
|
|
45
|
+
Setting up your database for privacy compliance follows this simple 5-step workflow:
|
|
37
46
|
|
|
38
|
-
### 1.
|
|
39
|
-
Safely analyze your
|
|
47
|
+
### 1. Introspect Your Database
|
|
48
|
+
Safely analyze your schema to discover PII and draft the deletion manifest. The AI will even find logical links if you don't use strict Foreign Keys!
|
|
40
49
|
|
|
41
50
|
```bash
|
|
42
51
|
dpdp-cli introspect \
|
|
43
|
-
--url postgres://user:pass@localhost:5432/app_db \
|
|
52
|
+
--url "postgres://user:pass@localhost:5432/app_db" \
|
|
44
53
|
--root public.users \
|
|
45
54
|
--schema public \
|
|
46
|
-
--output ./compliance.worker.yml
|
|
55
|
+
--output ./compliance.worker.yml
|
|
47
56
|
```
|
|
48
57
|
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
* `-r, --root <table>`: The root table containing the user/subject identifier (e.g., `public.users`).
|
|
52
|
-
* `-s, --schema <schema>`: The target PostgreSQL schema (defaults to `public`).
|
|
53
|
-
* `-o, --output <path>`: Where to write the generated YAML draft (defaults to `compliance.worker.yml.draft`).
|
|
54
|
-
* `-d, --max-depth <depth>`: Limit for recursive Foreign Key traversal (default: `32`).
|
|
55
|
-
* `--sample-percent <percent>`: Percentage of data to sample using `TABLESAMPLE` for PII detection (default: `1`).
|
|
56
|
-
* `--threshold <score>`: Confidence score required to flag a column as PII (default: `0.75`).
|
|
57
|
-
* `--report <path>`: Write a readable Markdown report of the findings.
|
|
58
|
-
|
|
59
|
-
---
|
|
60
|
-
|
|
61
|
-
### 2. `scan` (Quick PII Check)
|
|
62
|
-
A lightweight, metadata-only schema scan that looks for potential PII columns based purely on naming conventions, without the heavy block sampling used by `introspect`.
|
|
63
|
-
|
|
64
|
-
```bash
|
|
65
|
-
dpdp-cli scan --url "postgres://user:pass@localhost:5432/app_db" --schema public
|
|
66
|
-
```
|
|
67
|
-
|
|
68
|
-
---
|
|
69
|
-
|
|
70
|
-
### 3. `keygen` (Security Provisioning)
|
|
71
|
-
Provisions secure Ed25519 cryptographic keys required to sign your configuration manifest.
|
|
58
|
+
### 2. Review and Attest
|
|
59
|
+
Open the generated `compliance.worker.yml`. Review the `targets` and `join` conditions. Once you are confident, sign off by updating the `legal_attestation` block.
|
|
72
60
|
|
|
61
|
+
### 3. Generate Security Keys
|
|
62
|
+
Create a private/public keypair to securely sign your manifest for production environments.
|
|
73
63
|
```bash
|
|
74
64
|
dpdp-cli keygen
|
|
75
65
|
```
|
|
76
|
-
*This generates a private key file (e.g., `worker.pkcs8.key`) and a public key.*
|
|
77
|
-
|
|
78
|
-
---
|
|
79
|
-
|
|
80
|
-
### 4. `sign` (Cryptographic Manifest Lock)
|
|
81
|
-
To prevent unauthorized changes to data erasure rules in production, the manifest must be cryptographically signed by a Data Protection Officer (DPO) or Lead Engineer.
|
|
82
66
|
|
|
67
|
+
### 4. Cryptographically Sign the Manifest
|
|
68
|
+
Lock down the rules to prevent unauthorized changes in your CI/CD pipeline.
|
|
83
69
|
```bash
|
|
84
70
|
dpdp-cli sign --config ./compliance.worker.yml --key ./worker.pkcs8.key
|
|
85
71
|
```
|
|
86
|
-
*This generates a detached signature file (e.g., `compliance.worker.yml.sig`). The worker will fail to boot if this signature does not match the manifest.*
|
|
87
|
-
|
|
88
|
-
---
|
|
89
|
-
|
|
90
|
-
### 5. `check-integrity` & `verify-schema` (CI/CD Gates)
|
|
91
|
-
These commands are designed for CI/CD pipelines to ensure the live database schema matches the legal attestation hash stored in the signed manifest.
|
|
92
72
|
|
|
73
|
+
### 5. Simulate an Erasure (Dry-Run)
|
|
74
|
+
Test the erasure on a specific user. This command runs entirely within an isolated transaction that is automatically rolled back, so it is 100% safe.
|
|
93
75
|
```bash
|
|
94
|
-
|
|
95
|
-
dpdp-cli check-integrity --url "postgres://.../app_db" --config ./compliance.worker.yml
|
|
96
|
-
|
|
97
|
-
# Check only the live schema hash against the legal attestation
|
|
98
|
-
dpdp-cli verify-schema --url "postgres://.../app_db" --config ./compliance.worker.yml
|
|
76
|
+
dpdp-cli dry-run --id "user_12345" --url "postgres://user:pass@localhost:5432/app_db" --config ./compliance.worker.yml
|
|
99
77
|
```
|
|
100
|
-
*If a developer adds a new column to the database without updating and re-signing the manifest, these commands will exit with a non-zero status.*
|
|
101
78
|
|
|
102
79
|
---
|
|
103
80
|
|
|
104
|
-
|
|
105
|
-
Simulates a PII vault and redaction operation for a specific user. It runs inside an isolated transaction that is automatically rolled back.
|
|
81
|
+
## 🔒 CI/CD Integrity Checks
|
|
106
82
|
|
|
107
|
-
|
|
108
|
-
dpdp-cli dry-run --id "user_12345" --url "postgres://.../app_db" --config ./compliance.worker.yml
|
|
109
|
-
```
|
|
110
|
-
*This is the recommended safety check to help ensure your configuration captures related PII without breaking foreign keys.*
|
|
111
|
-
|
|
112
|
-
---
|
|
113
|
-
|
|
114
|
-
### 7. `graph` (Dependency Visualization)
|
|
115
|
-
Visualizes the recursive table dependencies (FK DAG) for a specific root table, helping you understand how data cascades down from a user.
|
|
83
|
+
You can use the CLI in your GitHub Actions or GitLab CI to fail builds if a developer modifies the database schema without updating the signed compliance manifest:
|
|
116
84
|
|
|
117
85
|
```bash
|
|
118
|
-
dpdp-cli
|
|
86
|
+
dpdp-cli check-integrity --url "postgres://..." --config ./compliance.worker.yml
|
|
119
87
|
```
|
|
120
88
|
|
|
121
89
|
---
|
|
122
90
|
|
|
123
|
-
##
|
|
124
|
-
|
|
125
|
-
Setting up the engine generally follows this workflow:
|
|
91
|
+
## 📖 Deep Dive
|
|
126
92
|
|
|
127
|
-
|
|
128
|
-
`dpdp-cli introspect -u postgres://... -r public.users -s public -o compliance.worker.yml`
|
|
129
|
-
2. **Review & Tweak** the `compliance.worker.yml` manually (fix false positives, add missing logical links, select masking actions like `HMAC` or `SET NULL`).
|
|
130
|
-
3. **Generate Keys** for signing:
|
|
131
|
-
`dpdp-cli keygen`
|
|
132
|
-
4. **Sign** the finalized manifest:
|
|
133
|
-
`dpdp-cli sign -c compliance.worker.yml -k worker.pkcs8.key`
|
|
134
|
-
5. **Dry-Run** an erasure to verify it behaves as expected:
|
|
135
|
-
`dpdp-cli dry-run -i "test_user_id" -u postgres://... -c compliance.worker.yml`
|
|
136
|
-
6. **Deploy** the signed manifest and the detached `.sig` file to your production Worker.
|
|
137
|
-
|
|
138
|
-
---
|
|
93
|
+
Want to understand the cryptographic shredding architecture under the hood? Read our full documentation at the main repository:
|
|
139
94
|
|
|
140
|
-
|
|
95
|
+
**[DPDP Erasure Engine GitHub Repository](https://github.com/devxdh/dpdp-erasure-engine)**
|
|
@@ -0,0 +1,164 @@
|
|
|
1
|
+
# AUTO-GENERATED BY INTROSPECTOR
|
|
2
|
+
# REVIEW REQUIRED: DPO must validate every table, join condition, and PII column before production use.
|
|
3
|
+
# Generated At: 2026-06-12T05:19:07.235Z
|
|
4
|
+
|
|
5
|
+
legal_attestation:
|
|
6
|
+
dpo_identifier: PENDING_REVIEW
|
|
7
|
+
configuration_version: introspector-draft
|
|
8
|
+
legal_review_date: PENDING_REVIEW
|
|
9
|
+
schema_hash: ea9e816d30fcee6bd4f322f1fb769e853c2e7ee5b19263aaa54f5ef80189212c
|
|
10
|
+
generated_by: compliance-introspector-v1
|
|
11
|
+
acknowledgment: PENDING_REVIEW
|
|
12
|
+
|
|
13
|
+
legal_disclaimer:
|
|
14
|
+
text: "Auto-generated by Compliance Worker. The DPO/Developer is responsible for verifying all logical links and PII mappings."
|
|
15
|
+
|
|
16
|
+
rules:
|
|
17
|
+
- id: dpdp_standard
|
|
18
|
+
root_table: public.users
|
|
19
|
+
max_depth: 32
|
|
20
|
+
targets:
|
|
21
|
+
- table: public.users
|
|
22
|
+
# Introspector Confidence: 0.950 (email)
|
|
23
|
+
# Introspector Confidence: 0.920 (indian_mobile)
|
|
24
|
+
pii_columns: [email, phone_number]
|
|
25
|
+
- table: public.kyc_documents
|
|
26
|
+
parent: public.users
|
|
27
|
+
join: "public.users.id = public.kyc_documents.user_id"
|
|
28
|
+
parent_columns: [id]
|
|
29
|
+
child_columns: [user_id]
|
|
30
|
+
pii_columns: []
|
|
31
|
+
- table: public.orders
|
|
32
|
+
parent: public.users
|
|
33
|
+
join: "public.users.id = public.orders.user_id"
|
|
34
|
+
parent_columns: [id]
|
|
35
|
+
child_columns: [user_id]
|
|
36
|
+
pii_columns: []
|
|
37
|
+
- table: public.support_tickets
|
|
38
|
+
parent: public.users
|
|
39
|
+
join: "public.users.id = public.support_tickets.user_id"
|
|
40
|
+
parent_columns: [id]
|
|
41
|
+
child_columns: [user_id]
|
|
42
|
+
# Introspector Confidence: 0.900 (indian_mobile)
|
|
43
|
+
pii_columns: [description]
|
|
44
|
+
primary_key_columns: [id]
|
|
45
|
+
action: redact
|
|
46
|
+
mutation_rules:
|
|
47
|
+
description: HMAC
|
|
48
|
+
- table: public.user_addresses
|
|
49
|
+
parent: public.users
|
|
50
|
+
join: "public.users.id = public.user_addresses.user_id"
|
|
51
|
+
parent_columns: [id]
|
|
52
|
+
child_columns: [user_id]
|
|
53
|
+
# Introspector Confidence: 0.780 (indian_pin_code)
|
|
54
|
+
pii_columns: [pincode]
|
|
55
|
+
primary_key_columns: [id]
|
|
56
|
+
action: redact
|
|
57
|
+
mutation_rules:
|
|
58
|
+
pincode: HMAC
|
|
59
|
+
- table: public.user_devices
|
|
60
|
+
parent: public.users
|
|
61
|
+
join: "public.users.id = public.user_devices.user_id"
|
|
62
|
+
parent_columns: [id]
|
|
63
|
+
child_columns: [user_id]
|
|
64
|
+
# Introspector Confidence: 0.820 (metadata)
|
|
65
|
+
pii_columns: [last_ip_address]
|
|
66
|
+
primary_key_columns: [id]
|
|
67
|
+
action: redact
|
|
68
|
+
mutation_rules:
|
|
69
|
+
last_ip_address: HMAC
|
|
70
|
+
- table: public.user_preferences
|
|
71
|
+
parent: public.users
|
|
72
|
+
join: "public.users.id = public.user_preferences.user_id"
|
|
73
|
+
parent_columns: [id]
|
|
74
|
+
child_columns: [user_id]
|
|
75
|
+
pii_columns: []
|
|
76
|
+
- table: public.order_items
|
|
77
|
+
parent: public.orders
|
|
78
|
+
join: "public.orders.id = public.order_items.order_id"
|
|
79
|
+
parent_columns: [id]
|
|
80
|
+
child_columns: [order_id]
|
|
81
|
+
pii_columns: []
|
|
82
|
+
- table: public.payments
|
|
83
|
+
parent: public.orders
|
|
84
|
+
join: "public.orders.id = public.payments.order_id"
|
|
85
|
+
parent_columns: [id]
|
|
86
|
+
child_columns: [order_id]
|
|
87
|
+
pii_columns: []
|
|
88
|
+
- table: public.ticket_messages
|
|
89
|
+
parent: public.support_tickets
|
|
90
|
+
join: "public.support_tickets.id = public.ticket_messages.ticket_id"
|
|
91
|
+
parent_columns: [id]
|
|
92
|
+
child_columns: [ticket_id]
|
|
93
|
+
# Introspector Confidence: 0.900 (indian_mobile)
|
|
94
|
+
pii_columns: [message_body]
|
|
95
|
+
primary_key_columns: [id]
|
|
96
|
+
action: redact
|
|
97
|
+
mutation_rules:
|
|
98
|
+
message_body: HMAC
|
|
99
|
+
- table: public.abandoned_carts
|
|
100
|
+
parent: public.users
|
|
101
|
+
join: "LOGICAL_LINK (customer_id)"
|
|
102
|
+
parent_columns: [customer_id]
|
|
103
|
+
child_columns: [customer_id]
|
|
104
|
+
pii_columns: []
|
|
105
|
+
- table: public.audit_logs
|
|
106
|
+
parent: public.users
|
|
107
|
+
join: "LOGICAL_LINK (actor_id)"
|
|
108
|
+
parent_columns: [actor_id]
|
|
109
|
+
child_columns: [actor_id]
|
|
110
|
+
# Introspector Confidence: 0.820 (ipv4)
|
|
111
|
+
pii_columns: [ip_address]
|
|
112
|
+
primary_key_columns: [id]
|
|
113
|
+
action: redact
|
|
114
|
+
mutation_rules:
|
|
115
|
+
ip_address: HMAC
|
|
116
|
+
- table: public.legacy_crm_notes
|
|
117
|
+
parent: public.users
|
|
118
|
+
join: "LOGICAL_LINK (client_id)"
|
|
119
|
+
parent_columns: [client_id]
|
|
120
|
+
child_columns: [client_id]
|
|
121
|
+
# Introspector Confidence: 0.950 (email, indian_mobile)
|
|
122
|
+
pii_columns: [agent_notes]
|
|
123
|
+
primary_key_columns: [id]
|
|
124
|
+
action: redact
|
|
125
|
+
mutation_rules:
|
|
126
|
+
agent_notes: HMAC
|
|
127
|
+
- table: public.marketing_campaign_clicks
|
|
128
|
+
parent: public.users
|
|
129
|
+
join: "LOGICAL_LINK (target_email)"
|
|
130
|
+
parent_columns: [target_email]
|
|
131
|
+
child_columns: [target_email]
|
|
132
|
+
# Introspector Confidence: 0.950 (email)
|
|
133
|
+
pii_columns: [target_email]
|
|
134
|
+
primary_key_columns: [id]
|
|
135
|
+
action: redact
|
|
136
|
+
mutation_rules:
|
|
137
|
+
target_email: HMAC
|
|
138
|
+
- table: public.third_party_telemetry
|
|
139
|
+
parent: public.users
|
|
140
|
+
join: "LOGICAL_LINK (user_uuid)"
|
|
141
|
+
parent_columns: [user_uuid]
|
|
142
|
+
child_columns: [user_uuid]
|
|
143
|
+
pii_columns: []
|
|
144
|
+
|
|
145
|
+
# [Potential Logical Link] public.users.customer_id <-> public.abandoned_carts.customer_id - Table exposes customer_id which conceptually maps to the root entity.
|
|
146
|
+
# [Potential Logical Link] public.users.actor_id <-> public.audit_logs.actor_id - Table exposes actor_id which conceptually maps to the root entity.
|
|
147
|
+
# [Potential Logical Link] public.kyc_documents.user_id <-> public.orders.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
148
|
+
# [Potential Logical Link] public.kyc_documents.user_id <-> public.support_tickets.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
149
|
+
# [Potential Logical Link] public.kyc_documents.user_id <-> public.user_addresses.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
150
|
+
# [Potential Logical Link] public.kyc_documents.user_id <-> public.user_devices.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
151
|
+
# [Potential Logical Link] public.kyc_documents.user_id <-> public.user_preferences.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
152
|
+
# [Potential Logical Link] public.orders.user_id <-> public.support_tickets.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
153
|
+
# [Potential Logical Link] public.orders.user_id <-> public.user_addresses.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
154
|
+
# [Potential Logical Link] public.orders.user_id <-> public.user_devices.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
155
|
+
# [Potential Logical Link] public.orders.user_id <-> public.user_preferences.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
156
|
+
# [Potential Logical Link] public.support_tickets.user_id <-> public.user_addresses.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
157
|
+
# [Potential Logical Link] public.support_tickets.user_id <-> public.user_devices.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
158
|
+
# [Potential Logical Link] public.support_tickets.user_id <-> public.user_preferences.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
159
|
+
# [Potential Logical Link] public.user_addresses.user_id <-> public.user_devices.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
160
|
+
# [Potential Logical Link] public.user_addresses.user_id <-> public.user_preferences.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
161
|
+
# [Potential Logical Link] public.user_devices.user_id <-> public.user_preferences.user_id - Both tables expose user_id but no physical foreign key was found.
|
|
162
|
+
# [Potential Logical Link] public.users.client_id <-> public.legacy_crm_notes.client_id - Table exposes client_id which conceptually maps to the root entity.
|
|
163
|
+
# [Potential Logical Link] public.users.target_email <-> public.marketing_campaign_clicks.target_email - Table exposes target_email which conceptually maps to the root entity.
|
|
164
|
+
# [Potential Logical Link] public.users.user_uuid <-> public.third_party_telemetry.user_uuid - Table exposes user_uuid which conceptually maps to the root entity.
|
package/package.json
CHANGED
package/report.json
ADDED
|
@@ -0,0 +1,370 @@
|
|
|
1
|
+
{
|
|
2
|
+
"summary": {
|
|
3
|
+
"rootTable": "public.users",
|
|
4
|
+
"generatedAt": "2026-06-12T05:19:07.235Z",
|
|
5
|
+
"schemaHash": "ea9e816d30fcee6bd4f322f1fb769e853c2e7ee5b19263aaa54f5ef80189212c",
|
|
6
|
+
"targetCount": 15,
|
|
7
|
+
"tablesWithPii": 8,
|
|
8
|
+
"piiColumnCount": 9,
|
|
9
|
+
"highConfidenceCount": 6,
|
|
10
|
+
"reviewRequiredCount": 3,
|
|
11
|
+
"potentialLogicalLinkCount": 20
|
|
12
|
+
},
|
|
13
|
+
"findings": [
|
|
14
|
+
{
|
|
15
|
+
"table": "public.legacy_crm_notes",
|
|
16
|
+
"column": "agent_notes",
|
|
17
|
+
"dataType": "text",
|
|
18
|
+
"confidence": 0.95,
|
|
19
|
+
"metadataScore": 0,
|
|
20
|
+
"contentMatchRatio": 1,
|
|
21
|
+
"sampleSize": 50,
|
|
22
|
+
"matchedSignatures": [
|
|
23
|
+
"email",
|
|
24
|
+
"indian_mobile"
|
|
25
|
+
]
|
|
26
|
+
},
|
|
27
|
+
{
|
|
28
|
+
"table": "public.marketing_campaign_clicks",
|
|
29
|
+
"column": "target_email",
|
|
30
|
+
"dataType": "character varying",
|
|
31
|
+
"confidence": 0.95,
|
|
32
|
+
"metadataScore": 0.92,
|
|
33
|
+
"contentMatchRatio": 1,
|
|
34
|
+
"sampleSize": 50,
|
|
35
|
+
"matchedSignatures": [
|
|
36
|
+
"email"
|
|
37
|
+
]
|
|
38
|
+
},
|
|
39
|
+
{
|
|
40
|
+
"table": "public.users",
|
|
41
|
+
"column": "email",
|
|
42
|
+
"dataType": "character varying",
|
|
43
|
+
"confidence": 0.95,
|
|
44
|
+
"metadataScore": 0.92,
|
|
45
|
+
"contentMatchRatio": 1,
|
|
46
|
+
"sampleSize": 50,
|
|
47
|
+
"matchedSignatures": [
|
|
48
|
+
"email"
|
|
49
|
+
]
|
|
50
|
+
},
|
|
51
|
+
{
|
|
52
|
+
"table": "public.users",
|
|
53
|
+
"column": "phone_number",
|
|
54
|
+
"dataType": "character varying",
|
|
55
|
+
"confidence": 0.92,
|
|
56
|
+
"metadataScore": 0.92,
|
|
57
|
+
"contentMatchRatio": 1,
|
|
58
|
+
"sampleSize": 50,
|
|
59
|
+
"matchedSignatures": [
|
|
60
|
+
"indian_mobile"
|
|
61
|
+
]
|
|
62
|
+
},
|
|
63
|
+
{
|
|
64
|
+
"table": "public.support_tickets",
|
|
65
|
+
"column": "description",
|
|
66
|
+
"dataType": "text",
|
|
67
|
+
"confidence": 0.9,
|
|
68
|
+
"metadataScore": 0,
|
|
69
|
+
"contentMatchRatio": 1,
|
|
70
|
+
"sampleSize": 50,
|
|
71
|
+
"matchedSignatures": [
|
|
72
|
+
"indian_mobile"
|
|
73
|
+
]
|
|
74
|
+
},
|
|
75
|
+
{
|
|
76
|
+
"table": "public.ticket_messages",
|
|
77
|
+
"column": "message_body",
|
|
78
|
+
"dataType": "text",
|
|
79
|
+
"confidence": 0.9,
|
|
80
|
+
"metadataScore": 0,
|
|
81
|
+
"contentMatchRatio": 1,
|
|
82
|
+
"sampleSize": 50,
|
|
83
|
+
"matchedSignatures": [
|
|
84
|
+
"indian_mobile"
|
|
85
|
+
]
|
|
86
|
+
},
|
|
87
|
+
{
|
|
88
|
+
"table": "public.audit_logs",
|
|
89
|
+
"column": "ip_address",
|
|
90
|
+
"dataType": "inet",
|
|
91
|
+
"confidence": 0.82,
|
|
92
|
+
"metadataScore": 0.82,
|
|
93
|
+
"contentMatchRatio": 1,
|
|
94
|
+
"sampleSize": 50,
|
|
95
|
+
"matchedSignatures": [
|
|
96
|
+
"ipv4"
|
|
97
|
+
]
|
|
98
|
+
},
|
|
99
|
+
{
|
|
100
|
+
"table": "public.user_devices",
|
|
101
|
+
"column": "last_ip_address",
|
|
102
|
+
"dataType": "inet",
|
|
103
|
+
"confidence": 0.82,
|
|
104
|
+
"metadataScore": 0.82,
|
|
105
|
+
"contentMatchRatio": 0,
|
|
106
|
+
"sampleSize": 0,
|
|
107
|
+
"matchedSignatures": []
|
|
108
|
+
},
|
|
109
|
+
{
|
|
110
|
+
"table": "public.user_addresses",
|
|
111
|
+
"column": "pincode",
|
|
112
|
+
"dataType": "character varying",
|
|
113
|
+
"confidence": 0.78,
|
|
114
|
+
"metadataScore": 0.62,
|
|
115
|
+
"contentMatchRatio": 1,
|
|
116
|
+
"sampleSize": 50,
|
|
117
|
+
"matchedSignatures": [
|
|
118
|
+
"indian_pin_code"
|
|
119
|
+
]
|
|
120
|
+
}
|
|
121
|
+
],
|
|
122
|
+
"potentialLogicalLinks": [
|
|
123
|
+
{
|
|
124
|
+
"sourceTable": {
|
|
125
|
+
"schema": "public",
|
|
126
|
+
"table": "users"
|
|
127
|
+
},
|
|
128
|
+
"targetTable": {
|
|
129
|
+
"schema": "public",
|
|
130
|
+
"table": "abandoned_carts"
|
|
131
|
+
},
|
|
132
|
+
"column": "customer_id",
|
|
133
|
+
"reason": "Table exposes customer_id which conceptually maps to the root entity."
|
|
134
|
+
},
|
|
135
|
+
{
|
|
136
|
+
"sourceTable": {
|
|
137
|
+
"schema": "public",
|
|
138
|
+
"table": "users"
|
|
139
|
+
},
|
|
140
|
+
"targetTable": {
|
|
141
|
+
"schema": "public",
|
|
142
|
+
"table": "audit_logs"
|
|
143
|
+
},
|
|
144
|
+
"column": "actor_id",
|
|
145
|
+
"reason": "Table exposes actor_id which conceptually maps to the root entity."
|
|
146
|
+
},
|
|
147
|
+
{
|
|
148
|
+
"sourceTable": {
|
|
149
|
+
"schema": "public",
|
|
150
|
+
"table": "kyc_documents"
|
|
151
|
+
},
|
|
152
|
+
"targetTable": {
|
|
153
|
+
"schema": "public",
|
|
154
|
+
"table": "orders"
|
|
155
|
+
},
|
|
156
|
+
"column": "user_id",
|
|
157
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
158
|
+
},
|
|
159
|
+
{
|
|
160
|
+
"sourceTable": {
|
|
161
|
+
"schema": "public",
|
|
162
|
+
"table": "kyc_documents"
|
|
163
|
+
},
|
|
164
|
+
"targetTable": {
|
|
165
|
+
"schema": "public",
|
|
166
|
+
"table": "support_tickets"
|
|
167
|
+
},
|
|
168
|
+
"column": "user_id",
|
|
169
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
170
|
+
},
|
|
171
|
+
{
|
|
172
|
+
"sourceTable": {
|
|
173
|
+
"schema": "public",
|
|
174
|
+
"table": "kyc_documents"
|
|
175
|
+
},
|
|
176
|
+
"targetTable": {
|
|
177
|
+
"schema": "public",
|
|
178
|
+
"table": "user_addresses"
|
|
179
|
+
},
|
|
180
|
+
"column": "user_id",
|
|
181
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
182
|
+
},
|
|
183
|
+
{
|
|
184
|
+
"sourceTable": {
|
|
185
|
+
"schema": "public",
|
|
186
|
+
"table": "kyc_documents"
|
|
187
|
+
},
|
|
188
|
+
"targetTable": {
|
|
189
|
+
"schema": "public",
|
|
190
|
+
"table": "user_devices"
|
|
191
|
+
},
|
|
192
|
+
"column": "user_id",
|
|
193
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
194
|
+
},
|
|
195
|
+
{
|
|
196
|
+
"sourceTable": {
|
|
197
|
+
"schema": "public",
|
|
198
|
+
"table": "kyc_documents"
|
|
199
|
+
},
|
|
200
|
+
"targetTable": {
|
|
201
|
+
"schema": "public",
|
|
202
|
+
"table": "user_preferences"
|
|
203
|
+
},
|
|
204
|
+
"column": "user_id",
|
|
205
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
206
|
+
},
|
|
207
|
+
{
|
|
208
|
+
"sourceTable": {
|
|
209
|
+
"schema": "public",
|
|
210
|
+
"table": "orders"
|
|
211
|
+
},
|
|
212
|
+
"targetTable": {
|
|
213
|
+
"schema": "public",
|
|
214
|
+
"table": "support_tickets"
|
|
215
|
+
},
|
|
216
|
+
"column": "user_id",
|
|
217
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
218
|
+
},
|
|
219
|
+
{
|
|
220
|
+
"sourceTable": {
|
|
221
|
+
"schema": "public",
|
|
222
|
+
"table": "orders"
|
|
223
|
+
},
|
|
224
|
+
"targetTable": {
|
|
225
|
+
"schema": "public",
|
|
226
|
+
"table": "user_addresses"
|
|
227
|
+
},
|
|
228
|
+
"column": "user_id",
|
|
229
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
230
|
+
},
|
|
231
|
+
{
|
|
232
|
+
"sourceTable": {
|
|
233
|
+
"schema": "public",
|
|
234
|
+
"table": "orders"
|
|
235
|
+
},
|
|
236
|
+
"targetTable": {
|
|
237
|
+
"schema": "public",
|
|
238
|
+
"table": "user_devices"
|
|
239
|
+
},
|
|
240
|
+
"column": "user_id",
|
|
241
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
242
|
+
},
|
|
243
|
+
{
|
|
244
|
+
"sourceTable": {
|
|
245
|
+
"schema": "public",
|
|
246
|
+
"table": "orders"
|
|
247
|
+
},
|
|
248
|
+
"targetTable": {
|
|
249
|
+
"schema": "public",
|
|
250
|
+
"table": "user_preferences"
|
|
251
|
+
},
|
|
252
|
+
"column": "user_id",
|
|
253
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
254
|
+
},
|
|
255
|
+
{
|
|
256
|
+
"sourceTable": {
|
|
257
|
+
"schema": "public",
|
|
258
|
+
"table": "support_tickets"
|
|
259
|
+
},
|
|
260
|
+
"targetTable": {
|
|
261
|
+
"schema": "public",
|
|
262
|
+
"table": "user_addresses"
|
|
263
|
+
},
|
|
264
|
+
"column": "user_id",
|
|
265
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
266
|
+
},
|
|
267
|
+
{
|
|
268
|
+
"sourceTable": {
|
|
269
|
+
"schema": "public",
|
|
270
|
+
"table": "support_tickets"
|
|
271
|
+
},
|
|
272
|
+
"targetTable": {
|
|
273
|
+
"schema": "public",
|
|
274
|
+
"table": "user_devices"
|
|
275
|
+
},
|
|
276
|
+
"column": "user_id",
|
|
277
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
278
|
+
},
|
|
279
|
+
{
|
|
280
|
+
"sourceTable": {
|
|
281
|
+
"schema": "public",
|
|
282
|
+
"table": "support_tickets"
|
|
283
|
+
},
|
|
284
|
+
"targetTable": {
|
|
285
|
+
"schema": "public",
|
|
286
|
+
"table": "user_preferences"
|
|
287
|
+
},
|
|
288
|
+
"column": "user_id",
|
|
289
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
290
|
+
},
|
|
291
|
+
{
|
|
292
|
+
"sourceTable": {
|
|
293
|
+
"schema": "public",
|
|
294
|
+
"table": "user_addresses"
|
|
295
|
+
},
|
|
296
|
+
"targetTable": {
|
|
297
|
+
"schema": "public",
|
|
298
|
+
"table": "user_devices"
|
|
299
|
+
},
|
|
300
|
+
"column": "user_id",
|
|
301
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
302
|
+
},
|
|
303
|
+
{
|
|
304
|
+
"sourceTable": {
|
|
305
|
+
"schema": "public",
|
|
306
|
+
"table": "user_addresses"
|
|
307
|
+
},
|
|
308
|
+
"targetTable": {
|
|
309
|
+
"schema": "public",
|
|
310
|
+
"table": "user_preferences"
|
|
311
|
+
},
|
|
312
|
+
"column": "user_id",
|
|
313
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
314
|
+
},
|
|
315
|
+
{
|
|
316
|
+
"sourceTable": {
|
|
317
|
+
"schema": "public",
|
|
318
|
+
"table": "user_devices"
|
|
319
|
+
},
|
|
320
|
+
"targetTable": {
|
|
321
|
+
"schema": "public",
|
|
322
|
+
"table": "user_preferences"
|
|
323
|
+
},
|
|
324
|
+
"column": "user_id",
|
|
325
|
+
"reason": "Both tables expose user_id but no physical foreign key was found."
|
|
326
|
+
},
|
|
327
|
+
{
|
|
328
|
+
"sourceTable": {
|
|
329
|
+
"schema": "public",
|
|
330
|
+
"table": "users"
|
|
331
|
+
},
|
|
332
|
+
"targetTable": {
|
|
333
|
+
"schema": "public",
|
|
334
|
+
"table": "legacy_crm_notes"
|
|
335
|
+
},
|
|
336
|
+
"column": "client_id",
|
|
337
|
+
"reason": "Table exposes client_id which conceptually maps to the root entity."
|
|
338
|
+
},
|
|
339
|
+
{
|
|
340
|
+
"sourceTable": {
|
|
341
|
+
"schema": "public",
|
|
342
|
+
"table": "users"
|
|
343
|
+
},
|
|
344
|
+
"targetTable": {
|
|
345
|
+
"schema": "public",
|
|
346
|
+
"table": "marketing_campaign_clicks"
|
|
347
|
+
},
|
|
348
|
+
"column": "target_email",
|
|
349
|
+
"reason": "Table exposes target_email which conceptually maps to the root entity."
|
|
350
|
+
},
|
|
351
|
+
{
|
|
352
|
+
"sourceTable": {
|
|
353
|
+
"schema": "public",
|
|
354
|
+
"table": "users"
|
|
355
|
+
},
|
|
356
|
+
"targetTable": {
|
|
357
|
+
"schema": "public",
|
|
358
|
+
"table": "third_party_telemetry"
|
|
359
|
+
},
|
|
360
|
+
"column": "user_uuid",
|
|
361
|
+
"reason": "Table exposes user_uuid which conceptually maps to the root entity."
|
|
362
|
+
}
|
|
363
|
+
],
|
|
364
|
+
"nextSteps": [
|
|
365
|
+
"Review every PII column and potential logical link with the application owner.",
|
|
366
|
+
"Copy reviewed targets into compliance.worker.yml and complete legal_attestation.",
|
|
367
|
+
"Run compliance-worker check-integrity before allowing live worker boot.",
|
|
368
|
+
"Sign the reviewed manifest with compliance-worker sign after DPO approval."
|
|
369
|
+
]
|
|
370
|
+
}
|
package/report.md
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# Compliance Introspector Report
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
- Root table: `public.users`
|
|
6
|
+
- Generated at: `2026-06-12T05:19:07.235Z`
|
|
7
|
+
- Schema hash: `ea9e816d30fcee6bd4f322f1fb769e853c2e7ee5b19263aaa54f5ef80189212c`
|
|
8
|
+
- DAG targets: 15
|
|
9
|
+
- Tables with PII: 8
|
|
10
|
+
- PII columns: 9
|
|
11
|
+
- High-confidence findings: 6
|
|
12
|
+
- Review-required findings: 3
|
|
13
|
+
- Potential logical links: 20
|
|
14
|
+
|
|
15
|
+
## PII Findings
|
|
16
|
+
|
|
17
|
+
| Table | Column | Type | Confidence | Metadata | Content | Signatures |
|
|
18
|
+
| --- | --- | --- | ---: | ---: | ---: | --- |
|
|
19
|
+
| `public.legacy_crm_notes` `agent_notes` `text` 0.950 0.000 1.000 email, indian_mobile |
|
|
20
|
+
| `public.marketing_campaign_clicks` `target_email` `character varying` 0.950 0.920 1.000 email |
|
|
21
|
+
| `public.users` `email` `character varying` 0.950 0.920 1.000 email |
|
|
22
|
+
| `public.users` `phone_number` `character varying` 0.920 0.920 1.000 indian_mobile |
|
|
23
|
+
| `public.support_tickets` `description` `text` 0.900 0.000 1.000 indian_mobile |
|
|
24
|
+
| `public.ticket_messages` `message_body` `text` 0.900 0.000 1.000 indian_mobile |
|
|
25
|
+
| `public.audit_logs` `ip_address` `inet` 0.820 0.820 1.000 ipv4 |
|
|
26
|
+
| `public.user_devices` `last_ip_address` `inet` 0.820 0.820 0.000 metadata |
|
|
27
|
+
| `public.user_addresses` `pincode` `character varying` 0.780 0.620 1.000 indian_pin_code |
|
|
28
|
+
|
|
29
|
+
## Potential Logical Links
|
|
30
|
+
|
|
31
|
+
- `public.users.customer_id` <-> `public.abandoned_carts.customer_id`: Table exposes customer_id which conceptually maps to the root entity.
|
|
32
|
+
- `public.users.actor_id` <-> `public.audit_logs.actor_id`: Table exposes actor_id which conceptually maps to the root entity.
|
|
33
|
+
- `public.kyc_documents.user_id` <-> `public.orders.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
34
|
+
- `public.kyc_documents.user_id` <-> `public.support_tickets.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
35
|
+
- `public.kyc_documents.user_id` <-> `public.user_addresses.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
36
|
+
- `public.kyc_documents.user_id` <-> `public.user_devices.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
37
|
+
- `public.kyc_documents.user_id` <-> `public.user_preferences.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
38
|
+
- `public.orders.user_id` <-> `public.support_tickets.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
39
|
+
- `public.orders.user_id` <-> `public.user_addresses.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
40
|
+
- `public.orders.user_id` <-> `public.user_devices.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
41
|
+
- `public.orders.user_id` <-> `public.user_preferences.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
42
|
+
- `public.support_tickets.user_id` <-> `public.user_addresses.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
43
|
+
- `public.support_tickets.user_id` <-> `public.user_devices.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
44
|
+
- `public.support_tickets.user_id` <-> `public.user_preferences.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
45
|
+
- `public.user_addresses.user_id` <-> `public.user_devices.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
46
|
+
- `public.user_addresses.user_id` <-> `public.user_preferences.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
47
|
+
- `public.user_devices.user_id` <-> `public.user_preferences.user_id`: Both tables expose user_id but no physical foreign key was found.
|
|
48
|
+
- `public.users.client_id` <-> `public.legacy_crm_notes.client_id`: Table exposes client_id which conceptually maps to the root entity.
|
|
49
|
+
- `public.users.target_email` <-> `public.marketing_campaign_clicks.target_email`: Table exposes target_email which conceptually maps to the root entity.
|
|
50
|
+
- `public.users.user_uuid` <-> `public.third_party_telemetry.user_uuid`: Table exposes user_uuid which conceptually maps to the root entity.
|
|
51
|
+
|
|
52
|
+
## Next Steps
|
|
53
|
+
|
|
54
|
+
- Review every PII column and potential logical link with the application owner.
|
|
55
|
+
- Copy reviewed targets into compliance.worker.yml and complete legal_attestation.
|
|
56
|
+
- Run compliance-worker check-integrity before allowing live worker boot.
|
|
57
|
+
- Sign the reviewed manifest with compliance-worker sign after DPO approval.
|
|
@@ -52,9 +52,9 @@ async function hardDeleteSatelliteRows(
|
|
|
52
52
|
|
|
53
53
|
while (true) {
|
|
54
54
|
const tenantFilter = tenantId ? tx` AND tenant_id = ${tenantId}` : tx``;
|
|
55
|
-
const deletedRows = await tx<{
|
|
55
|
+
const deletedRows = await tx<{ ctid: string }[]>`
|
|
56
56
|
WITH batch AS (
|
|
57
|
-
SELECT
|
|
57
|
+
SELECT ctid
|
|
58
58
|
FROM ${tx(appSchema)}.${tx(tableName)}
|
|
59
59
|
WHERE ${tx(lookupColumn)} = ${lookupValue}
|
|
60
60
|
${tenantFilter}
|
|
@@ -62,8 +62,8 @@ async function hardDeleteSatelliteRows(
|
|
|
62
62
|
FOR UPDATE SKIP LOCKED
|
|
63
63
|
)
|
|
64
64
|
DELETE FROM ${tx(appSchema)}.${tx(tableName)}
|
|
65
|
-
WHERE
|
|
66
|
-
RETURNING
|
|
65
|
+
WHERE ctid IN (SELECT ctid FROM batch)
|
|
66
|
+
RETURNING ctid
|
|
67
67
|
`;
|
|
68
68
|
|
|
69
69
|
if (deletedRows.length === 0) {
|
|
@@ -3,7 +3,7 @@ import type { Tsql } from "@/types";
|
|
|
3
3
|
import { assertIdentifier } from "@/utils";
|
|
4
4
|
|
|
5
5
|
interface SatelliteRowId {
|
|
6
|
-
|
|
6
|
+
ctid: string;
|
|
7
7
|
}
|
|
8
8
|
|
|
9
9
|
async function yieldWorkerEventLoop(): Promise<void> {
|
|
@@ -78,7 +78,7 @@ export async function redactSatelliteTable(
|
|
|
78
78
|
const tenantFilter = tenantId ? tx` AND tenant_id = ${tenantId}` : tx``;
|
|
79
79
|
const updatedRows = await tx<SatelliteRowId[]>`
|
|
80
80
|
WITH batch AS (
|
|
81
|
-
SELECT
|
|
81
|
+
SELECT ctid
|
|
82
82
|
FROM ${tx(schema)}.${tx(table)}
|
|
83
83
|
WHERE ${tx(safeLookupColumn)} = ${lookupValue}
|
|
84
84
|
${tenantFilter}
|
|
@@ -87,8 +87,8 @@ export async function redactSatelliteTable(
|
|
|
87
87
|
)
|
|
88
88
|
UPDATE ${tx(schema)}.${tx(table)}
|
|
89
89
|
SET ${tx(safeLookupColumn)} = ${newHmacValue}
|
|
90
|
-
WHERE
|
|
91
|
-
RETURNING
|
|
90
|
+
WHERE ctid IN (SELECT ctid FROM batch)
|
|
91
|
+
RETURNING ctid
|
|
92
92
|
`;
|
|
93
93
|
|
|
94
94
|
if (updatedRows.length === 0) {
|
|
@@ -137,6 +137,7 @@ const METADATA_PATTERNS: Array<{ pattern: RegExp; score: number }> = [
|
|
|
137
137
|
{ pattern: /(^|_)(driving_license|driving_licence|license_number|licence_number|dl_number|dl_no)($|_)/i, score: MEDIUM_METADATA_SCORE },
|
|
138
138
|
{ pattern: /(^|_)(address|street|postal_code|zip_code|pin_code|pincode)($|_)/i, score: WEAK_METADATA_SCORE },
|
|
139
139
|
{ pattern: /(^|_)(device_fingerprint|device_id|advertising_id|gaid|idfa)($|_)/i, score: WEAK_METADATA_SCORE },
|
|
140
|
+
{ pattern: /(^|_)(document_number|identity_number|id_number)($|_)/i, score: WEAK_METADATA_SCORE },
|
|
140
141
|
];
|
|
141
142
|
|
|
142
143
|
function qualifiedKey(table: QualifiedTable): string {
|
|
@@ -336,12 +337,23 @@ function classifyLeafDetailed(value: string, columnName: string = ""): ContentSi
|
|
|
336
337
|
const bytes = textEncoder.encode(value.trim());
|
|
337
338
|
try {
|
|
338
339
|
const normalized = textDecoder.decode(bytes).trim();
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
|
|
343
|
-
|
|
344
|
-
|
|
340
|
+
// Split into tokens and strip leading/trailing punctuation so regexes can match substrings
|
|
341
|
+
const tokens = normalized.split(/\s+/).map((t) => t.replace(/^[^\w\+]+|[^\w]+$/g, ""));
|
|
342
|
+
const candidates = Array.from(new Set([normalized, ...tokens])).filter((t) => t.length > 0);
|
|
343
|
+
|
|
344
|
+
const matches = new Set<ContentSignature>();
|
|
345
|
+
for (const candidate of candidates) {
|
|
346
|
+
for (const signature of CONTENT_SIGNATURES) {
|
|
347
|
+
if (
|
|
348
|
+
signatureHasMetadataSupport(signature, columnName) &&
|
|
349
|
+
signature.pattern.test(candidate) &&
|
|
350
|
+
(!signature.validate || signature.validate(candidate))
|
|
351
|
+
) {
|
|
352
|
+
matches.add(signature);
|
|
353
|
+
}
|
|
354
|
+
}
|
|
355
|
+
}
|
|
356
|
+
return Array.from(matches);
|
|
345
357
|
} finally {
|
|
346
358
|
bytes.fill(0);
|
|
347
359
|
}
|
|
@@ -236,7 +236,7 @@ export async function discoverPotentialLogicalLinks(
|
|
|
236
236
|
const byColumn = new Map<string, QualifiedTable[]>();
|
|
237
237
|
for (const row of rows) {
|
|
238
238
|
const normalized = row.column_name.toLowerCase();
|
|
239
|
-
if (!/^(?:user_id|account_id|customer_id|member_id|subject_id|.*_user_id)$/.test(normalized)) {
|
|
239
|
+
if (!/^(?:user_id|account_id|customer_id|client_id|actor_id|user_uuid|member_id|subject_id|.*_user_id|target_email|user_email)$/.test(normalized)) {
|
|
240
240
|
continue;
|
|
241
241
|
}
|
|
242
242
|
|
|
@@ -247,7 +247,25 @@ export async function discoverPotentialLogicalLinks(
|
|
|
247
247
|
|
|
248
248
|
const links: PotentialLogicalLink[] = [];
|
|
249
249
|
const emitted = new Set<string>();
|
|
250
|
+
|
|
250
251
|
for (const [column, tables] of byColumn.entries()) {
|
|
252
|
+
for (const table of tables) {
|
|
253
|
+
// Explicitly link any orphan table that has an identity-like column to the root table
|
|
254
|
+
if (table.schema === root.schema && table.table === root.table) {
|
|
255
|
+
continue;
|
|
256
|
+
}
|
|
257
|
+
const key = physicalLinkKey(root, table, column);
|
|
258
|
+
if (!physicalLinks.has(key) && !emitted.has(key)) {
|
|
259
|
+
emitted.add(key);
|
|
260
|
+
links.push({
|
|
261
|
+
sourceTable: root,
|
|
262
|
+
targetTable: table,
|
|
263
|
+
column,
|
|
264
|
+
reason: `Table exposes ${column} which conceptually maps to the root entity.`,
|
|
265
|
+
});
|
|
266
|
+
}
|
|
267
|
+
}
|
|
268
|
+
|
|
251
269
|
if (tables.length < 2) {
|
|
252
270
|
continue;
|
|
253
271
|
}
|
|
@@ -38,14 +38,6 @@ export async function runIntrospector(options: RunIntrospectorOptions): Promise<
|
|
|
38
38
|
maxDepth,
|
|
39
39
|
});
|
|
40
40
|
|
|
41
|
-
const classifiedColumns = await classifyDagTargets({
|
|
42
|
-
sql: options.sql,
|
|
43
|
-
targets: dag,
|
|
44
|
-
samplePercent: options.samplePercent,
|
|
45
|
-
sampleLimit: options.sampleLimit,
|
|
46
|
-
threshold: options.threshold,
|
|
47
|
-
});
|
|
48
|
-
|
|
49
41
|
const [schemaHash, potentialLogicalLinks] = await Promise.all([
|
|
50
42
|
detectSchemaDrift(options.sql, root.schema),
|
|
51
43
|
discoverPotentialLogicalLinks(
|
|
@@ -58,7 +50,57 @@ export async function runIntrospector(options: RunIntrospectorOptions): Promise<
|
|
|
58
50
|
),
|
|
59
51
|
]);
|
|
60
52
|
|
|
61
|
-
const
|
|
53
|
+
const dagTableKeys = new Set(dag.map((t) => targetKey(t.table.schema, t.table.table)));
|
|
54
|
+
const logicalTargets: typeof dag = [];
|
|
55
|
+
|
|
56
|
+
for (const link of potentialLogicalLinks) {
|
|
57
|
+
const sourceKey = targetKey(link.sourceTable.schema, link.sourceTable.table);
|
|
58
|
+
const targetKeyStr = targetKey(link.targetTable.schema, link.targetTable.table);
|
|
59
|
+
|
|
60
|
+
let parentCol = link.column;
|
|
61
|
+
let childCol = link.column;
|
|
62
|
+
|
|
63
|
+
// Attempt intelligent primary key mapping for orphaned root links
|
|
64
|
+
if (link.sourceTable.schema === root.schema && link.sourceTable.table === root.table) {
|
|
65
|
+
parentCol = ["target_email", "user_email", "email_address"].includes(link.column) ? "email" : "id";
|
|
66
|
+
}
|
|
67
|
+
|
|
68
|
+
if (dagTableKeys.has(sourceKey) && !dagTableKeys.has(targetKeyStr)) {
|
|
69
|
+
dagTableKeys.add(targetKeyStr);
|
|
70
|
+
logicalTargets.push({
|
|
71
|
+
table: link.targetTable,
|
|
72
|
+
parentTable: link.sourceTable,
|
|
73
|
+
constraintName: null,
|
|
74
|
+
childColumns: [childCol],
|
|
75
|
+
parentColumns: [parentCol],
|
|
76
|
+
depth: maxDepth,
|
|
77
|
+
fkCondition: `${formatQualifiedTable(link.sourceTable)}.${parentCol} = ${formatQualifiedTable(link.targetTable)}.${childCol}`,
|
|
78
|
+
});
|
|
79
|
+
} else if (dagTableKeys.has(targetKeyStr) && !dagTableKeys.has(sourceKey)) {
|
|
80
|
+
dagTableKeys.add(sourceKey);
|
|
81
|
+
logicalTargets.push({
|
|
82
|
+
table: link.sourceTable,
|
|
83
|
+
parentTable: link.targetTable,
|
|
84
|
+
constraintName: null,
|
|
85
|
+
childColumns: [childCol],
|
|
86
|
+
parentColumns: [parentCol],
|
|
87
|
+
depth: maxDepth,
|
|
88
|
+
fkCondition: `${formatQualifiedTable(link.targetTable)}.${parentCol} = ${formatQualifiedTable(link.sourceTable)}.${childCol}`,
|
|
89
|
+
});
|
|
90
|
+
}
|
|
91
|
+
}
|
|
92
|
+
|
|
93
|
+
const fullTargets = [...dag, ...logicalTargets];
|
|
94
|
+
|
|
95
|
+
const classifiedColumns = await classifyDagTargets({
|
|
96
|
+
sql: options.sql,
|
|
97
|
+
targets: fullTargets,
|
|
98
|
+
samplePercent: options.samplePercent,
|
|
99
|
+
sampleLimit: options.sampleLimit,
|
|
100
|
+
threshold: options.threshold,
|
|
101
|
+
});
|
|
102
|
+
|
|
103
|
+
const targets: IntrospectorTargetDraft[] = fullTargets.map((target) => ({
|
|
62
104
|
table: target.table,
|
|
63
105
|
parentTable: target.parentTable,
|
|
64
106
|
fkCondition: target.fkCondition,
|
|
@@ -83,6 +83,20 @@ export function renderIntrospectorYaml(draft: IntrospectorDraft): string {
|
|
|
83
83
|
"legal_disclaimer:",
|
|
84
84
|
" text: \"Auto-generated by Compliance Worker. The DPO/Developer is responsible for verifying all logical links and PII mappings.\"",
|
|
85
85
|
"",
|
|
86
|
+
"# ===================================================================================",
|
|
87
|
+
"# HOW TO READ THIS MANIFEST:",
|
|
88
|
+
"# - 'targets': The list of tables the worker will delete/redact from.",
|
|
89
|
+
"# - 'parent': The table that owns this data. The worker deletes parent-first or child-first depending on the DB constraints.",
|
|
90
|
+
"# - 'join': The SQL condition used to link the child table to the parent table.",
|
|
91
|
+
"# - 'pii_columns': Columns identified as containing Personal Identifiable Information.",
|
|
92
|
+
"# - 'action': 'redact' (anonymizes the row but keeps it) or implicitly 'delete' (removes the row entirely).",
|
|
93
|
+
"#",
|
|
94
|
+
"# IMPORTANT:",
|
|
95
|
+
"# 1. Review all 'join' conditions. If the Introspector guessed a join for an orphaned table, verify it.",
|
|
96
|
+
"# 2. Review all 'pii_columns' to ensure no sensitive columns were missed.",
|
|
97
|
+
"# 3. Replace 'PENDING_REVIEW' in legal_attestation once verified.",
|
|
98
|
+
"# ===================================================================================",
|
|
99
|
+
"",
|
|
86
100
|
"rules:",
|
|
87
101
|
" - id: dpdp_standard",
|
|
88
102
|
` root_table: ${yamlScalar(formatQualifiedTable(draft.root))}`,
|