dpdp-erasure-cli 1.0.11 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,31 +1,38 @@
1
- # Operator CLI Reference Guide (`dpdp-erasure-cli`)
1
+ # dpdp-erasure-cli
2
2
 
3
3
  [![npm version](https://badge.fury.io/js/dpdp-erasure-cli.svg)](https://badge.fury.io/js/dpdp-erasure-cli)
4
4
 
5
- The **Operator CLI** (`dpdp-erasure-cli`) is the primary interface for data engineers, DevOps, and privacy operators to manage the DPDP Erasure Engine.
5
+ **The DPDP Erasure Engine CLI** is an automated, AI-assisted privacy toolkit that helps you securely discover, map, and cryptographically shred PII (Personally Identifiable Information) in your database.
6
6
 
7
- It is used to introspect databases, classify PII, manage compliance manifests, sign them cryptographically, and simulate safe erasure operations locally.
7
+ It acts as the control plane for the [DPDP Erasure Engine](https://github.com/devxdh/dpdp-erasure-engine), allowing Data Protection Officers (DPOs) and Software Engineers to effortlessly comply with global privacy laws (DPDP, GDPR, CCPA) without writing manual SQL deletion scripts.
8
+
9
+ ---
10
+
11
+ ## 🎯 What does it do?
12
+
13
+ Manually deleting a user across dozens of microservice tables is dangerous and prone to failure. `dpdp-erasure-cli` solves this by:
14
+
15
+ 1. **Introspection & NLP Mapping:** Safely scans your live database (using `TABLESAMPLE` block sampling) to find hidden PII in text columns, JSON blobs, and orphaned tables.
16
+ 2. **DAG Compilation:** Maps your entire Foreign Key graph to figure out the exact order tables must be deleted to avoid database constraint violations.
17
+ 3. **Drafting a Manifest:** Automatically generates a `compliance.worker.yml` erasure plan that handles HMAC redaction vs. hard deletion.
18
+ 4. **Cryptographic Signatures:** Locks the manifest using Ed25519 signatures so production deletion rules cannot be silently altered.
19
+ 5. **Dry-Run Simulations:** Tests the erasure locally in a rolled-back PostgreSQL transaction to prove it works before you deploy.
8
20
 
9
21
  ---
10
22
 
11
23
  ## 🚀 Installation
12
24
 
13
- This CLI requires [Bun](https://bun.sh/) for native cryptographic bindings and high-performance execution. Ensure you have Bun installed, then install the package globally:
25
+ This CLI relies on [Bun](https://bun.sh/) for native cryptographic bindings and high-performance execution.
14
26
 
15
27
  ```bash
16
28
  npm install -g dpdp-erasure-cli
17
29
  ```
18
30
 
19
- *Alternatively, if running from the monorepo root:*
20
- ```bash
21
- bun run --cwd ./apps/worker cli <command>
22
- ```
23
-
24
31
  ---
25
32
 
26
- ## 🛠️ Interactive Console
33
+ ## 🛠️ Interactive Setup
27
34
 
28
- If you run the CLI without any arguments, it will launch an interactive wizard to guide you through the available operations:
35
+ Don't want to memorize commands? Just run the CLI with no arguments to launch the interactive wizard:
29
36
 
30
37
  ```bash
31
38
  dpdp-cli
@@ -33,108 +40,56 @@ dpdp-cli
33
40
 
34
41
  ---
35
42
 
36
- ## 📚 Core Commands & Configuration Guide
43
+ ## 📚 Quick Start Guide
44
+
45
+ Setting up your database for privacy compliance follows this simple 5-step workflow:
37
46
 
38
- ### 1. `introspect` (The Core Command)
39
- Safely analyze your database's Foreign Key (FK) Directed Acyclic Graph (DAG) offline and draft a comprehensive PII mapping manifest (`compliance.worker.yml`). This is the first step in setting up the engine.
47
+ ### 1. Introspect Your Database
48
+ Safely analyze your schema to discover PII and draft the deletion manifest. The AI will even find logical links if you don't use strict Foreign Keys!
40
49
 
41
50
  ```bash
42
51
  dpdp-cli introspect \
43
- --url postgres://user:pass@localhost:5432/app_db \
52
+ --url "postgres://user:pass@localhost:5432/app_db" \
44
53
  --root public.users \
45
54
  --schema public \
46
- --output ./compliance.worker.yml.draft
55
+ --output ./compliance.worker.yml
47
56
  ```
48
57
 
49
- **Options:**
50
- * `-u, --url <url>`: PostgreSQL Connection DSN.
51
- * `-r, --root <table>`: The root table containing the user/subject identifier (e.g., `public.users`).
52
- * `-s, --schema <schema>`: The target PostgreSQL schema (defaults to `public`).
53
- * `-o, --output <path>`: Where to write the generated YAML draft (defaults to `compliance.worker.yml.draft`).
54
- * `-d, --max-depth <depth>`: Limit for recursive Foreign Key traversal (default: `32`).
55
- * `--sample-percent <percent>`: Percentage of data to sample using `TABLESAMPLE` for PII detection (default: `1`).
56
- * `--threshold <score>`: Confidence score required to flag a column as PII (default: `0.75`).
57
- * `--report <path>`: Write a readable Markdown report of the findings.
58
-
59
- ---
60
-
61
- ### 2. `scan` (Quick PII Check)
62
- A lightweight, metadata-only schema scan that looks for potential PII columns based purely on naming conventions, without the heavy block sampling used by `introspect`.
63
-
64
- ```bash
65
- dpdp-cli scan --url "postgres://user:pass@localhost:5432/app_db" --schema public
66
- ```
67
-
68
- ---
69
-
70
- ### 3. `keygen` (Security Provisioning)
71
- Provisions secure Ed25519 cryptographic keys required to sign your configuration manifest.
58
+ ### 2. Review and Attest
59
+ Open the generated `compliance.worker.yml`. Review the `targets` and `join` conditions. Once you are confident, sign off by updating the `legal_attestation` block.
72
60
 
61
+ ### 3. Generate Security Keys
62
+ Create a private/public keypair to securely sign your manifest for production environments.
73
63
  ```bash
74
64
  dpdp-cli keygen
75
65
  ```
76
- *This generates a private key file (e.g., `worker.pkcs8.key`) and a public key.*
77
-
78
- ---
79
-
80
- ### 4. `sign` (Cryptographic Manifest Lock)
81
- To prevent unauthorized changes to data erasure rules in production, the manifest must be cryptographically signed by a Data Protection Officer (DPO) or Lead Engineer.
82
66
 
67
+ ### 4. Cryptographically Sign the Manifest
68
+ Lock down the rules to prevent unauthorized changes in your CI/CD pipeline.
83
69
  ```bash
84
70
  dpdp-cli sign --config ./compliance.worker.yml --key ./worker.pkcs8.key
85
71
  ```
86
- *This generates a detached signature file (e.g., `compliance.worker.yml.sig`). The worker will fail to boot if this signature does not match the manifest.*
87
-
88
- ---
89
-
90
- ### 5. `check-integrity` & `verify-schema` (CI/CD Gates)
91
- These commands are designed for CI/CD pipelines to ensure the live database schema matches the legal attestation hash stored in the signed manifest.
92
72
 
73
+ ### 5. Simulate an Erasure (Dry-Run)
74
+ Test the erasure on a specific user. This command runs entirely within an isolated transaction that is automatically rolled back, so it is 100% safe.
93
75
  ```bash
94
- # Verify the compiled DAG and live schema hash
95
- dpdp-cli check-integrity --url "postgres://.../app_db" --config ./compliance.worker.yml
96
-
97
- # Check only the live schema hash against the legal attestation
98
- dpdp-cli verify-schema --url "postgres://.../app_db" --config ./compliance.worker.yml
76
+ dpdp-cli dry-run --id "user_12345" --url "postgres://user:pass@localhost:5432/app_db" --config ./compliance.worker.yml
99
77
  ```
100
- *If a developer adds a new column to the database without updating and re-signing the manifest, these commands will exit with a non-zero status.*
101
78
 
102
79
  ---
103
80
 
104
- ### 6. `dry-run` (Safe Erasure Simulation)
105
- Simulates a PII vault and redaction operation for a specific user. It runs inside an isolated transaction that is automatically rolled back.
81
+ ## 🔒 CI/CD Integrity Checks
106
82
 
107
- ```bash
108
- dpdp-cli dry-run --id "user_12345" --url "postgres://.../app_db" --config ./compliance.worker.yml
109
- ```
110
- *This is the recommended safety check to help ensure your configuration captures related PII without breaking foreign keys.*
111
-
112
- ---
113
-
114
- ### 7. `graph` (Dependency Visualization)
115
- Visualizes the recursive table dependencies (FK DAG) for a specific root table, helping you understand how data cascades down from a user.
83
+ You can use the CLI in your GitHub Actions or GitLab CI to fail builds if a developer modifies the database schema without updating the signed compliance manifest:
116
84
 
117
85
  ```bash
118
- dpdp-cli graph --table public.users --url "postgres://.../app_db"
86
+ dpdp-cli check-integrity --url "postgres://..." --config ./compliance.worker.yml
119
87
  ```
120
88
 
121
89
  ---
122
90
 
123
- ## ⚙️ Standard Workflow Example
124
-
125
- Setting up the engine generally follows this workflow:
91
+ ## 📖 Deep Dive
126
92
 
127
- 1. **Introspect** the database to generate a draft manifest:
128
- `dpdp-cli introspect -u postgres://... -r public.users -s public -o compliance.worker.yml`
129
- 2. **Review & Tweak** the `compliance.worker.yml` manually (fix false positives, add missing logical links, select masking actions like `HMAC` or `SET NULL`).
130
- 3. **Generate Keys** for signing:
131
- `dpdp-cli keygen`
132
- 4. **Sign** the finalized manifest:
133
- `dpdp-cli sign -c compliance.worker.yml -k worker.pkcs8.key`
134
- 5. **Dry-Run** an erasure to verify it behaves as expected:
135
- `dpdp-cli dry-run -i "test_user_id" -u postgres://... -c compliance.worker.yml`
136
- 6. **Deploy** the signed manifest and the detached `.sig` file to your production Worker.
137
-
138
- ---
93
+ Want to understand the cryptographic shredding architecture under the hood? Read our full documentation at the main repository:
139
94
 
140
- For architectural details and deep-dives into how the cryptographic shredding works, refer to the [Main Documentation](https://github.com/devxdh/dpdp-erasure-engine/tree/main/apps/docs).
95
+ **[DPDP Erasure Engine GitHub Repository](https://github.com/devxdh/dpdp-erasure-engine)**
@@ -0,0 +1,164 @@
1
+ # AUTO-GENERATED BY INTROSPECTOR
2
+ # REVIEW REQUIRED: DPO must validate every table, join condition, and PII column before production use.
3
+ # Generated At: 2026-06-12T05:19:07.235Z
4
+
5
+ legal_attestation:
6
+ dpo_identifier: PENDING_REVIEW
7
+ configuration_version: introspector-draft
8
+ legal_review_date: PENDING_REVIEW
9
+ schema_hash: ea9e816d30fcee6bd4f322f1fb769e853c2e7ee5b19263aaa54f5ef80189212c
10
+ generated_by: compliance-introspector-v1
11
+ acknowledgment: PENDING_REVIEW
12
+
13
+ legal_disclaimer:
14
+ text: "Auto-generated by Compliance Worker. The DPO/Developer is responsible for verifying all logical links and PII mappings."
15
+
16
+ rules:
17
+ - id: dpdp_standard
18
+ root_table: public.users
19
+ max_depth: 32
20
+ targets:
21
+ - table: public.users
22
+ # Introspector Confidence: 0.950 (email)
23
+ # Introspector Confidence: 0.920 (indian_mobile)
24
+ pii_columns: [email, phone_number]
25
+ - table: public.kyc_documents
26
+ parent: public.users
27
+ join: "public.users.id = public.kyc_documents.user_id"
28
+ parent_columns: [id]
29
+ child_columns: [user_id]
30
+ pii_columns: []
31
+ - table: public.orders
32
+ parent: public.users
33
+ join: "public.users.id = public.orders.user_id"
34
+ parent_columns: [id]
35
+ child_columns: [user_id]
36
+ pii_columns: []
37
+ - table: public.support_tickets
38
+ parent: public.users
39
+ join: "public.users.id = public.support_tickets.user_id"
40
+ parent_columns: [id]
41
+ child_columns: [user_id]
42
+ # Introspector Confidence: 0.900 (indian_mobile)
43
+ pii_columns: [description]
44
+ primary_key_columns: [id]
45
+ action: redact
46
+ mutation_rules:
47
+ description: HMAC
48
+ - table: public.user_addresses
49
+ parent: public.users
50
+ join: "public.users.id = public.user_addresses.user_id"
51
+ parent_columns: [id]
52
+ child_columns: [user_id]
53
+ # Introspector Confidence: 0.780 (indian_pin_code)
54
+ pii_columns: [pincode]
55
+ primary_key_columns: [id]
56
+ action: redact
57
+ mutation_rules:
58
+ pincode: HMAC
59
+ - table: public.user_devices
60
+ parent: public.users
61
+ join: "public.users.id = public.user_devices.user_id"
62
+ parent_columns: [id]
63
+ child_columns: [user_id]
64
+ # Introspector Confidence: 0.820 (metadata)
65
+ pii_columns: [last_ip_address]
66
+ primary_key_columns: [id]
67
+ action: redact
68
+ mutation_rules:
69
+ last_ip_address: HMAC
70
+ - table: public.user_preferences
71
+ parent: public.users
72
+ join: "public.users.id = public.user_preferences.user_id"
73
+ parent_columns: [id]
74
+ child_columns: [user_id]
75
+ pii_columns: []
76
+ - table: public.order_items
77
+ parent: public.orders
78
+ join: "public.orders.id = public.order_items.order_id"
79
+ parent_columns: [id]
80
+ child_columns: [order_id]
81
+ pii_columns: []
82
+ - table: public.payments
83
+ parent: public.orders
84
+ join: "public.orders.id = public.payments.order_id"
85
+ parent_columns: [id]
86
+ child_columns: [order_id]
87
+ pii_columns: []
88
+ - table: public.ticket_messages
89
+ parent: public.support_tickets
90
+ join: "public.support_tickets.id = public.ticket_messages.ticket_id"
91
+ parent_columns: [id]
92
+ child_columns: [ticket_id]
93
+ # Introspector Confidence: 0.900 (indian_mobile)
94
+ pii_columns: [message_body]
95
+ primary_key_columns: [id]
96
+ action: redact
97
+ mutation_rules:
98
+ message_body: HMAC
99
+ - table: public.abandoned_carts
100
+ parent: public.users
101
+ join: "LOGICAL_LINK (customer_id)"
102
+ parent_columns: [customer_id]
103
+ child_columns: [customer_id]
104
+ pii_columns: []
105
+ - table: public.audit_logs
106
+ parent: public.users
107
+ join: "LOGICAL_LINK (actor_id)"
108
+ parent_columns: [actor_id]
109
+ child_columns: [actor_id]
110
+ # Introspector Confidence: 0.820 (ipv4)
111
+ pii_columns: [ip_address]
112
+ primary_key_columns: [id]
113
+ action: redact
114
+ mutation_rules:
115
+ ip_address: HMAC
116
+ - table: public.legacy_crm_notes
117
+ parent: public.users
118
+ join: "LOGICAL_LINK (client_id)"
119
+ parent_columns: [client_id]
120
+ child_columns: [client_id]
121
+ # Introspector Confidence: 0.950 (email, indian_mobile)
122
+ pii_columns: [agent_notes]
123
+ primary_key_columns: [id]
124
+ action: redact
125
+ mutation_rules:
126
+ agent_notes: HMAC
127
+ - table: public.marketing_campaign_clicks
128
+ parent: public.users
129
+ join: "LOGICAL_LINK (target_email)"
130
+ parent_columns: [target_email]
131
+ child_columns: [target_email]
132
+ # Introspector Confidence: 0.950 (email)
133
+ pii_columns: [target_email]
134
+ primary_key_columns: [id]
135
+ action: redact
136
+ mutation_rules:
137
+ target_email: HMAC
138
+ - table: public.third_party_telemetry
139
+ parent: public.users
140
+ join: "LOGICAL_LINK (user_uuid)"
141
+ parent_columns: [user_uuid]
142
+ child_columns: [user_uuid]
143
+ pii_columns: []
144
+
145
+ # [Potential Logical Link] public.users.customer_id <-> public.abandoned_carts.customer_id - Table exposes customer_id which conceptually maps to the root entity.
146
+ # [Potential Logical Link] public.users.actor_id <-> public.audit_logs.actor_id - Table exposes actor_id which conceptually maps to the root entity.
147
+ # [Potential Logical Link] public.kyc_documents.user_id <-> public.orders.user_id - Both tables expose user_id but no physical foreign key was found.
148
+ # [Potential Logical Link] public.kyc_documents.user_id <-> public.support_tickets.user_id - Both tables expose user_id but no physical foreign key was found.
149
+ # [Potential Logical Link] public.kyc_documents.user_id <-> public.user_addresses.user_id - Both tables expose user_id but no physical foreign key was found.
150
+ # [Potential Logical Link] public.kyc_documents.user_id <-> public.user_devices.user_id - Both tables expose user_id but no physical foreign key was found.
151
+ # [Potential Logical Link] public.kyc_documents.user_id <-> public.user_preferences.user_id - Both tables expose user_id but no physical foreign key was found.
152
+ # [Potential Logical Link] public.orders.user_id <-> public.support_tickets.user_id - Both tables expose user_id but no physical foreign key was found.
153
+ # [Potential Logical Link] public.orders.user_id <-> public.user_addresses.user_id - Both tables expose user_id but no physical foreign key was found.
154
+ # [Potential Logical Link] public.orders.user_id <-> public.user_devices.user_id - Both tables expose user_id but no physical foreign key was found.
155
+ # [Potential Logical Link] public.orders.user_id <-> public.user_preferences.user_id - Both tables expose user_id but no physical foreign key was found.
156
+ # [Potential Logical Link] public.support_tickets.user_id <-> public.user_addresses.user_id - Both tables expose user_id but no physical foreign key was found.
157
+ # [Potential Logical Link] public.support_tickets.user_id <-> public.user_devices.user_id - Both tables expose user_id but no physical foreign key was found.
158
+ # [Potential Logical Link] public.support_tickets.user_id <-> public.user_preferences.user_id - Both tables expose user_id but no physical foreign key was found.
159
+ # [Potential Logical Link] public.user_addresses.user_id <-> public.user_devices.user_id - Both tables expose user_id but no physical foreign key was found.
160
+ # [Potential Logical Link] public.user_addresses.user_id <-> public.user_preferences.user_id - Both tables expose user_id but no physical foreign key was found.
161
+ # [Potential Logical Link] public.user_devices.user_id <-> public.user_preferences.user_id - Both tables expose user_id but no physical foreign key was found.
162
+ # [Potential Logical Link] public.users.client_id <-> public.legacy_crm_notes.client_id - Table exposes client_id which conceptually maps to the root entity.
163
+ # [Potential Logical Link] public.users.target_email <-> public.marketing_campaign_clicks.target_email - Table exposes target_email which conceptually maps to the root entity.
164
+ # [Potential Logical Link] public.users.user_uuid <-> public.third_party_telemetry.user_uuid - Table exposes user_uuid which conceptually maps to the root entity.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "dpdp-erasure-cli",
3
- "version": "1.0.11",
3
+ "version": "1.1.0",
4
4
  "license": "Apache-2.0",
5
5
  "keywords": [
6
6
  "dpdp",
@@ -54,4 +54,4 @@
54
54
  "postgres": "^3.4.9",
55
55
  "zod": "^4.4.2"
56
56
  }
57
- }
57
+ }
package/report.json ADDED
@@ -0,0 +1,370 @@
1
+ {
2
+ "summary": {
3
+ "rootTable": "public.users",
4
+ "generatedAt": "2026-06-12T05:19:07.235Z",
5
+ "schemaHash": "ea9e816d30fcee6bd4f322f1fb769e853c2e7ee5b19263aaa54f5ef80189212c",
6
+ "targetCount": 15,
7
+ "tablesWithPii": 8,
8
+ "piiColumnCount": 9,
9
+ "highConfidenceCount": 6,
10
+ "reviewRequiredCount": 3,
11
+ "potentialLogicalLinkCount": 20
12
+ },
13
+ "findings": [
14
+ {
15
+ "table": "public.legacy_crm_notes",
16
+ "column": "agent_notes",
17
+ "dataType": "text",
18
+ "confidence": 0.95,
19
+ "metadataScore": 0,
20
+ "contentMatchRatio": 1,
21
+ "sampleSize": 50,
22
+ "matchedSignatures": [
23
+ "email",
24
+ "indian_mobile"
25
+ ]
26
+ },
27
+ {
28
+ "table": "public.marketing_campaign_clicks",
29
+ "column": "target_email",
30
+ "dataType": "character varying",
31
+ "confidence": 0.95,
32
+ "metadataScore": 0.92,
33
+ "contentMatchRatio": 1,
34
+ "sampleSize": 50,
35
+ "matchedSignatures": [
36
+ "email"
37
+ ]
38
+ },
39
+ {
40
+ "table": "public.users",
41
+ "column": "email",
42
+ "dataType": "character varying",
43
+ "confidence": 0.95,
44
+ "metadataScore": 0.92,
45
+ "contentMatchRatio": 1,
46
+ "sampleSize": 50,
47
+ "matchedSignatures": [
48
+ "email"
49
+ ]
50
+ },
51
+ {
52
+ "table": "public.users",
53
+ "column": "phone_number",
54
+ "dataType": "character varying",
55
+ "confidence": 0.92,
56
+ "metadataScore": 0.92,
57
+ "contentMatchRatio": 1,
58
+ "sampleSize": 50,
59
+ "matchedSignatures": [
60
+ "indian_mobile"
61
+ ]
62
+ },
63
+ {
64
+ "table": "public.support_tickets",
65
+ "column": "description",
66
+ "dataType": "text",
67
+ "confidence": 0.9,
68
+ "metadataScore": 0,
69
+ "contentMatchRatio": 1,
70
+ "sampleSize": 50,
71
+ "matchedSignatures": [
72
+ "indian_mobile"
73
+ ]
74
+ },
75
+ {
76
+ "table": "public.ticket_messages",
77
+ "column": "message_body",
78
+ "dataType": "text",
79
+ "confidence": 0.9,
80
+ "metadataScore": 0,
81
+ "contentMatchRatio": 1,
82
+ "sampleSize": 50,
83
+ "matchedSignatures": [
84
+ "indian_mobile"
85
+ ]
86
+ },
87
+ {
88
+ "table": "public.audit_logs",
89
+ "column": "ip_address",
90
+ "dataType": "inet",
91
+ "confidence": 0.82,
92
+ "metadataScore": 0.82,
93
+ "contentMatchRatio": 1,
94
+ "sampleSize": 50,
95
+ "matchedSignatures": [
96
+ "ipv4"
97
+ ]
98
+ },
99
+ {
100
+ "table": "public.user_devices",
101
+ "column": "last_ip_address",
102
+ "dataType": "inet",
103
+ "confidence": 0.82,
104
+ "metadataScore": 0.82,
105
+ "contentMatchRatio": 0,
106
+ "sampleSize": 0,
107
+ "matchedSignatures": []
108
+ },
109
+ {
110
+ "table": "public.user_addresses",
111
+ "column": "pincode",
112
+ "dataType": "character varying",
113
+ "confidence": 0.78,
114
+ "metadataScore": 0.62,
115
+ "contentMatchRatio": 1,
116
+ "sampleSize": 50,
117
+ "matchedSignatures": [
118
+ "indian_pin_code"
119
+ ]
120
+ }
121
+ ],
122
+ "potentialLogicalLinks": [
123
+ {
124
+ "sourceTable": {
125
+ "schema": "public",
126
+ "table": "users"
127
+ },
128
+ "targetTable": {
129
+ "schema": "public",
130
+ "table": "abandoned_carts"
131
+ },
132
+ "column": "customer_id",
133
+ "reason": "Table exposes customer_id which conceptually maps to the root entity."
134
+ },
135
+ {
136
+ "sourceTable": {
137
+ "schema": "public",
138
+ "table": "users"
139
+ },
140
+ "targetTable": {
141
+ "schema": "public",
142
+ "table": "audit_logs"
143
+ },
144
+ "column": "actor_id",
145
+ "reason": "Table exposes actor_id which conceptually maps to the root entity."
146
+ },
147
+ {
148
+ "sourceTable": {
149
+ "schema": "public",
150
+ "table": "kyc_documents"
151
+ },
152
+ "targetTable": {
153
+ "schema": "public",
154
+ "table": "orders"
155
+ },
156
+ "column": "user_id",
157
+ "reason": "Both tables expose user_id but no physical foreign key was found."
158
+ },
159
+ {
160
+ "sourceTable": {
161
+ "schema": "public",
162
+ "table": "kyc_documents"
163
+ },
164
+ "targetTable": {
165
+ "schema": "public",
166
+ "table": "support_tickets"
167
+ },
168
+ "column": "user_id",
169
+ "reason": "Both tables expose user_id but no physical foreign key was found."
170
+ },
171
+ {
172
+ "sourceTable": {
173
+ "schema": "public",
174
+ "table": "kyc_documents"
175
+ },
176
+ "targetTable": {
177
+ "schema": "public",
178
+ "table": "user_addresses"
179
+ },
180
+ "column": "user_id",
181
+ "reason": "Both tables expose user_id but no physical foreign key was found."
182
+ },
183
+ {
184
+ "sourceTable": {
185
+ "schema": "public",
186
+ "table": "kyc_documents"
187
+ },
188
+ "targetTable": {
189
+ "schema": "public",
190
+ "table": "user_devices"
191
+ },
192
+ "column": "user_id",
193
+ "reason": "Both tables expose user_id but no physical foreign key was found."
194
+ },
195
+ {
196
+ "sourceTable": {
197
+ "schema": "public",
198
+ "table": "kyc_documents"
199
+ },
200
+ "targetTable": {
201
+ "schema": "public",
202
+ "table": "user_preferences"
203
+ },
204
+ "column": "user_id",
205
+ "reason": "Both tables expose user_id but no physical foreign key was found."
206
+ },
207
+ {
208
+ "sourceTable": {
209
+ "schema": "public",
210
+ "table": "orders"
211
+ },
212
+ "targetTable": {
213
+ "schema": "public",
214
+ "table": "support_tickets"
215
+ },
216
+ "column": "user_id",
217
+ "reason": "Both tables expose user_id but no physical foreign key was found."
218
+ },
219
+ {
220
+ "sourceTable": {
221
+ "schema": "public",
222
+ "table": "orders"
223
+ },
224
+ "targetTable": {
225
+ "schema": "public",
226
+ "table": "user_addresses"
227
+ },
228
+ "column": "user_id",
229
+ "reason": "Both tables expose user_id but no physical foreign key was found."
230
+ },
231
+ {
232
+ "sourceTable": {
233
+ "schema": "public",
234
+ "table": "orders"
235
+ },
236
+ "targetTable": {
237
+ "schema": "public",
238
+ "table": "user_devices"
239
+ },
240
+ "column": "user_id",
241
+ "reason": "Both tables expose user_id but no physical foreign key was found."
242
+ },
243
+ {
244
+ "sourceTable": {
245
+ "schema": "public",
246
+ "table": "orders"
247
+ },
248
+ "targetTable": {
249
+ "schema": "public",
250
+ "table": "user_preferences"
251
+ },
252
+ "column": "user_id",
253
+ "reason": "Both tables expose user_id but no physical foreign key was found."
254
+ },
255
+ {
256
+ "sourceTable": {
257
+ "schema": "public",
258
+ "table": "support_tickets"
259
+ },
260
+ "targetTable": {
261
+ "schema": "public",
262
+ "table": "user_addresses"
263
+ },
264
+ "column": "user_id",
265
+ "reason": "Both tables expose user_id but no physical foreign key was found."
266
+ },
267
+ {
268
+ "sourceTable": {
269
+ "schema": "public",
270
+ "table": "support_tickets"
271
+ },
272
+ "targetTable": {
273
+ "schema": "public",
274
+ "table": "user_devices"
275
+ },
276
+ "column": "user_id",
277
+ "reason": "Both tables expose user_id but no physical foreign key was found."
278
+ },
279
+ {
280
+ "sourceTable": {
281
+ "schema": "public",
282
+ "table": "support_tickets"
283
+ },
284
+ "targetTable": {
285
+ "schema": "public",
286
+ "table": "user_preferences"
287
+ },
288
+ "column": "user_id",
289
+ "reason": "Both tables expose user_id but no physical foreign key was found."
290
+ },
291
+ {
292
+ "sourceTable": {
293
+ "schema": "public",
294
+ "table": "user_addresses"
295
+ },
296
+ "targetTable": {
297
+ "schema": "public",
298
+ "table": "user_devices"
299
+ },
300
+ "column": "user_id",
301
+ "reason": "Both tables expose user_id but no physical foreign key was found."
302
+ },
303
+ {
304
+ "sourceTable": {
305
+ "schema": "public",
306
+ "table": "user_addresses"
307
+ },
308
+ "targetTable": {
309
+ "schema": "public",
310
+ "table": "user_preferences"
311
+ },
312
+ "column": "user_id",
313
+ "reason": "Both tables expose user_id but no physical foreign key was found."
314
+ },
315
+ {
316
+ "sourceTable": {
317
+ "schema": "public",
318
+ "table": "user_devices"
319
+ },
320
+ "targetTable": {
321
+ "schema": "public",
322
+ "table": "user_preferences"
323
+ },
324
+ "column": "user_id",
325
+ "reason": "Both tables expose user_id but no physical foreign key was found."
326
+ },
327
+ {
328
+ "sourceTable": {
329
+ "schema": "public",
330
+ "table": "users"
331
+ },
332
+ "targetTable": {
333
+ "schema": "public",
334
+ "table": "legacy_crm_notes"
335
+ },
336
+ "column": "client_id",
337
+ "reason": "Table exposes client_id which conceptually maps to the root entity."
338
+ },
339
+ {
340
+ "sourceTable": {
341
+ "schema": "public",
342
+ "table": "users"
343
+ },
344
+ "targetTable": {
345
+ "schema": "public",
346
+ "table": "marketing_campaign_clicks"
347
+ },
348
+ "column": "target_email",
349
+ "reason": "Table exposes target_email which conceptually maps to the root entity."
350
+ },
351
+ {
352
+ "sourceTable": {
353
+ "schema": "public",
354
+ "table": "users"
355
+ },
356
+ "targetTable": {
357
+ "schema": "public",
358
+ "table": "third_party_telemetry"
359
+ },
360
+ "column": "user_uuid",
361
+ "reason": "Table exposes user_uuid which conceptually maps to the root entity."
362
+ }
363
+ ],
364
+ "nextSteps": [
365
+ "Review every PII column and potential logical link with the application owner.",
366
+ "Copy reviewed targets into compliance.worker.yml and complete legal_attestation.",
367
+ "Run compliance-worker check-integrity before allowing live worker boot.",
368
+ "Sign the reviewed manifest with compliance-worker sign after DPO approval."
369
+ ]
370
+ }
package/report.md ADDED
@@ -0,0 +1,57 @@
1
+ # Compliance Introspector Report
2
+
3
+ ## Summary
4
+
5
+ - Root table: `public.users`
6
+ - Generated at: `2026-06-12T05:19:07.235Z`
7
+ - Schema hash: `ea9e816d30fcee6bd4f322f1fb769e853c2e7ee5b19263aaa54f5ef80189212c`
8
+ - DAG targets: 15
9
+ - Tables with PII: 8
10
+ - PII columns: 9
11
+ - High-confidence findings: 6
12
+ - Review-required findings: 3
13
+ - Potential logical links: 20
14
+
15
+ ## PII Findings
16
+
17
+ | Table | Column | Type | Confidence | Metadata | Content | Signatures |
18
+ | --- | --- | --- | ---: | ---: | ---: | --- |
19
+ | `public.legacy_crm_notes` `agent_notes` `text` 0.950 0.000 1.000 email, indian_mobile |
20
+ | `public.marketing_campaign_clicks` `target_email` `character varying` 0.950 0.920 1.000 email |
21
+ | `public.users` `email` `character varying` 0.950 0.920 1.000 email |
22
+ | `public.users` `phone_number` `character varying` 0.920 0.920 1.000 indian_mobile |
23
+ | `public.support_tickets` `description` `text` 0.900 0.000 1.000 indian_mobile |
24
+ | `public.ticket_messages` `message_body` `text` 0.900 0.000 1.000 indian_mobile |
25
+ | `public.audit_logs` `ip_address` `inet` 0.820 0.820 1.000 ipv4 |
26
+ | `public.user_devices` `last_ip_address` `inet` 0.820 0.820 0.000 metadata |
27
+ | `public.user_addresses` `pincode` `character varying` 0.780 0.620 1.000 indian_pin_code |
28
+
29
+ ## Potential Logical Links
30
+
31
+ - `public.users.customer_id` <-> `public.abandoned_carts.customer_id`: Table exposes customer_id which conceptually maps to the root entity.
32
+ - `public.users.actor_id` <-> `public.audit_logs.actor_id`: Table exposes actor_id which conceptually maps to the root entity.
33
+ - `public.kyc_documents.user_id` <-> `public.orders.user_id`: Both tables expose user_id but no physical foreign key was found.
34
+ - `public.kyc_documents.user_id` <-> `public.support_tickets.user_id`: Both tables expose user_id but no physical foreign key was found.
35
+ - `public.kyc_documents.user_id` <-> `public.user_addresses.user_id`: Both tables expose user_id but no physical foreign key was found.
36
+ - `public.kyc_documents.user_id` <-> `public.user_devices.user_id`: Both tables expose user_id but no physical foreign key was found.
37
+ - `public.kyc_documents.user_id` <-> `public.user_preferences.user_id`: Both tables expose user_id but no physical foreign key was found.
38
+ - `public.orders.user_id` <-> `public.support_tickets.user_id`: Both tables expose user_id but no physical foreign key was found.
39
+ - `public.orders.user_id` <-> `public.user_addresses.user_id`: Both tables expose user_id but no physical foreign key was found.
40
+ - `public.orders.user_id` <-> `public.user_devices.user_id`: Both tables expose user_id but no physical foreign key was found.
41
+ - `public.orders.user_id` <-> `public.user_preferences.user_id`: Both tables expose user_id but no physical foreign key was found.
42
+ - `public.support_tickets.user_id` <-> `public.user_addresses.user_id`: Both tables expose user_id but no physical foreign key was found.
43
+ - `public.support_tickets.user_id` <-> `public.user_devices.user_id`: Both tables expose user_id but no physical foreign key was found.
44
+ - `public.support_tickets.user_id` <-> `public.user_preferences.user_id`: Both tables expose user_id but no physical foreign key was found.
45
+ - `public.user_addresses.user_id` <-> `public.user_devices.user_id`: Both tables expose user_id but no physical foreign key was found.
46
+ - `public.user_addresses.user_id` <-> `public.user_preferences.user_id`: Both tables expose user_id but no physical foreign key was found.
47
+ - `public.user_devices.user_id` <-> `public.user_preferences.user_id`: Both tables expose user_id but no physical foreign key was found.
48
+ - `public.users.client_id` <-> `public.legacy_crm_notes.client_id`: Table exposes client_id which conceptually maps to the root entity.
49
+ - `public.users.target_email` <-> `public.marketing_campaign_clicks.target_email`: Table exposes target_email which conceptually maps to the root entity.
50
+ - `public.users.user_uuid` <-> `public.third_party_telemetry.user_uuid`: Table exposes user_uuid which conceptually maps to the root entity.
51
+
52
+ ## Next Steps
53
+
54
+ - Review every PII column and potential logical link with the application owner.
55
+ - Copy reviewed targets into compliance.worker.yml and complete legal_attestation.
56
+ - Run compliance-worker check-integrity before allowing live worker boot.
57
+ - Sign the reviewed manifest with compliance-worker sign after DPO approval.
@@ -52,9 +52,9 @@ async function hardDeleteSatelliteRows(
52
52
 
53
53
  while (true) {
54
54
  const tenantFilter = tenantId ? tx` AND tenant_id = ${tenantId}` : tx``;
55
- const deletedRows = await tx<{ id: string | number }[]>`
55
+ const deletedRows = await tx<{ ctid: string }[]>`
56
56
  WITH batch AS (
57
- SELECT id
57
+ SELECT ctid
58
58
  FROM ${tx(appSchema)}.${tx(tableName)}
59
59
  WHERE ${tx(lookupColumn)} = ${lookupValue}
60
60
  ${tenantFilter}
@@ -62,8 +62,8 @@ async function hardDeleteSatelliteRows(
62
62
  FOR UPDATE SKIP LOCKED
63
63
  )
64
64
  DELETE FROM ${tx(appSchema)}.${tx(tableName)}
65
- WHERE id IN (SELECT id FROM batch)
66
- RETURNING id
65
+ WHERE ctid IN (SELECT ctid FROM batch)
66
+ RETURNING ctid
67
67
  `;
68
68
 
69
69
  if (deletedRows.length === 0) {
@@ -3,7 +3,7 @@ import type { Tsql } from "@/types";
3
3
  import { assertIdentifier } from "@/utils";
4
4
 
5
5
  interface SatelliteRowId {
6
- id: number;
6
+ ctid: string;
7
7
  }
8
8
 
9
9
  async function yieldWorkerEventLoop(): Promise<void> {
@@ -78,7 +78,7 @@ export async function redactSatelliteTable(
78
78
  const tenantFilter = tenantId ? tx` AND tenant_id = ${tenantId}` : tx``;
79
79
  const updatedRows = await tx<SatelliteRowId[]>`
80
80
  WITH batch AS (
81
- SELECT id
81
+ SELECT ctid
82
82
  FROM ${tx(schema)}.${tx(table)}
83
83
  WHERE ${tx(safeLookupColumn)} = ${lookupValue}
84
84
  ${tenantFilter}
@@ -87,8 +87,8 @@ export async function redactSatelliteTable(
87
87
  )
88
88
  UPDATE ${tx(schema)}.${tx(table)}
89
89
  SET ${tx(safeLookupColumn)} = ${newHmacValue}
90
- WHERE id IN (SELECT id FROM batch)
91
- RETURNING id
90
+ WHERE ctid IN (SELECT ctid FROM batch)
91
+ RETURNING ctid
92
92
  `;
93
93
 
94
94
  if (updatedRows.length === 0) {
@@ -137,6 +137,7 @@ const METADATA_PATTERNS: Array<{ pattern: RegExp; score: number }> = [
137
137
  { pattern: /(^|_)(driving_license|driving_licence|license_number|licence_number|dl_number|dl_no)($|_)/i, score: MEDIUM_METADATA_SCORE },
138
138
  { pattern: /(^|_)(address|street|postal_code|zip_code|pin_code|pincode)($|_)/i, score: WEAK_METADATA_SCORE },
139
139
  { pattern: /(^|_)(device_fingerprint|device_id|advertising_id|gaid|idfa)($|_)/i, score: WEAK_METADATA_SCORE },
140
+ { pattern: /(^|_)(document_number|identity_number|id_number)($|_)/i, score: WEAK_METADATA_SCORE },
140
141
  ];
141
142
 
142
143
  function qualifiedKey(table: QualifiedTable): string {
@@ -336,12 +337,23 @@ function classifyLeafDetailed(value: string, columnName: string = ""): ContentSi
336
337
  const bytes = textEncoder.encode(value.trim());
337
338
  try {
338
339
  const normalized = textDecoder.decode(bytes).trim();
339
- return CONTENT_SIGNATURES
340
- .filter((signature) =>
341
- signatureHasMetadataSupport(signature, columnName) &&
342
- signature.pattern.test(normalized) &&
343
- (!signature.validate || signature.validate(normalized))
344
- );
340
+ // Split into tokens and strip leading/trailing punctuation so regexes can match substrings
341
+ const tokens = normalized.split(/\s+/).map((t) => t.replace(/^[^\w\+]+|[^\w]+$/g, ""));
342
+ const candidates = Array.from(new Set([normalized, ...tokens])).filter((t) => t.length > 0);
343
+
344
+ const matches = new Set<ContentSignature>();
345
+ for (const candidate of candidates) {
346
+ for (const signature of CONTENT_SIGNATURES) {
347
+ if (
348
+ signatureHasMetadataSupport(signature, columnName) &&
349
+ signature.pattern.test(candidate) &&
350
+ (!signature.validate || signature.validate(candidate))
351
+ ) {
352
+ matches.add(signature);
353
+ }
354
+ }
355
+ }
356
+ return Array.from(matches);
345
357
  } finally {
346
358
  bytes.fill(0);
347
359
  }
@@ -236,7 +236,7 @@ export async function discoverPotentialLogicalLinks(
236
236
  const byColumn = new Map<string, QualifiedTable[]>();
237
237
  for (const row of rows) {
238
238
  const normalized = row.column_name.toLowerCase();
239
- if (!/^(?:user_id|account_id|customer_id|member_id|subject_id|.*_user_id)$/.test(normalized)) {
239
+ if (!/^(?:user_id|account_id|customer_id|client_id|actor_id|user_uuid|member_id|subject_id|.*_user_id|target_email|user_email)$/.test(normalized)) {
240
240
  continue;
241
241
  }
242
242
 
@@ -247,7 +247,25 @@ export async function discoverPotentialLogicalLinks(
247
247
 
248
248
  const links: PotentialLogicalLink[] = [];
249
249
  const emitted = new Set<string>();
250
+
250
251
  for (const [column, tables] of byColumn.entries()) {
252
+ for (const table of tables) {
253
+ // Explicitly link any orphan table that has an identity-like column to the root table
254
+ if (table.schema === root.schema && table.table === root.table) {
255
+ continue;
256
+ }
257
+ const key = physicalLinkKey(root, table, column);
258
+ if (!physicalLinks.has(key) && !emitted.has(key)) {
259
+ emitted.add(key);
260
+ links.push({
261
+ sourceTable: root,
262
+ targetTable: table,
263
+ column,
264
+ reason: `Table exposes ${column} which conceptually maps to the root entity.`,
265
+ });
266
+ }
267
+ }
268
+
251
269
  if (tables.length < 2) {
252
270
  continue;
253
271
  }
@@ -38,14 +38,6 @@ export async function runIntrospector(options: RunIntrospectorOptions): Promise<
38
38
  maxDepth,
39
39
  });
40
40
 
41
- const classifiedColumns = await classifyDagTargets({
42
- sql: options.sql,
43
- targets: dag,
44
- samplePercent: options.samplePercent,
45
- sampleLimit: options.sampleLimit,
46
- threshold: options.threshold,
47
- });
48
-
49
41
  const [schemaHash, potentialLogicalLinks] = await Promise.all([
50
42
  detectSchemaDrift(options.sql, root.schema),
51
43
  discoverPotentialLogicalLinks(
@@ -58,7 +50,57 @@ export async function runIntrospector(options: RunIntrospectorOptions): Promise<
58
50
  ),
59
51
  ]);
60
52
 
61
- const targets: IntrospectorTargetDraft[] = dag.map((target) => ({
53
+ const dagTableKeys = new Set(dag.map((t) => targetKey(t.table.schema, t.table.table)));
54
+ const logicalTargets: typeof dag = [];
55
+
56
+ for (const link of potentialLogicalLinks) {
57
+ const sourceKey = targetKey(link.sourceTable.schema, link.sourceTable.table);
58
+ const targetKeyStr = targetKey(link.targetTable.schema, link.targetTable.table);
59
+
60
+ let parentCol = link.column;
61
+ let childCol = link.column;
62
+
63
+ // Attempt intelligent primary key mapping for orphaned root links
64
+ if (link.sourceTable.schema === root.schema && link.sourceTable.table === root.table) {
65
+ parentCol = ["target_email", "user_email", "email_address"].includes(link.column) ? "email" : "id";
66
+ }
67
+
68
+ if (dagTableKeys.has(sourceKey) && !dagTableKeys.has(targetKeyStr)) {
69
+ dagTableKeys.add(targetKeyStr);
70
+ logicalTargets.push({
71
+ table: link.targetTable,
72
+ parentTable: link.sourceTable,
73
+ constraintName: null,
74
+ childColumns: [childCol],
75
+ parentColumns: [parentCol],
76
+ depth: maxDepth,
77
+ fkCondition: `${formatQualifiedTable(link.sourceTable)}.${parentCol} = ${formatQualifiedTable(link.targetTable)}.${childCol}`,
78
+ });
79
+ } else if (dagTableKeys.has(targetKeyStr) && !dagTableKeys.has(sourceKey)) {
80
+ dagTableKeys.add(sourceKey);
81
+ logicalTargets.push({
82
+ table: link.sourceTable,
83
+ parentTable: link.targetTable,
84
+ constraintName: null,
85
+ childColumns: [childCol],
86
+ parentColumns: [parentCol],
87
+ depth: maxDepth,
88
+ fkCondition: `${formatQualifiedTable(link.targetTable)}.${parentCol} = ${formatQualifiedTable(link.sourceTable)}.${childCol}`,
89
+ });
90
+ }
91
+ }
92
+
93
+ const fullTargets = [...dag, ...logicalTargets];
94
+
95
+ const classifiedColumns = await classifyDagTargets({
96
+ sql: options.sql,
97
+ targets: fullTargets,
98
+ samplePercent: options.samplePercent,
99
+ sampleLimit: options.sampleLimit,
100
+ threshold: options.threshold,
101
+ });
102
+
103
+ const targets: IntrospectorTargetDraft[] = fullTargets.map((target) => ({
62
104
  table: target.table,
63
105
  parentTable: target.parentTable,
64
106
  fkCondition: target.fkCondition,
@@ -83,6 +83,20 @@ export function renderIntrospectorYaml(draft: IntrospectorDraft): string {
83
83
  "legal_disclaimer:",
84
84
  " text: \"Auto-generated by Compliance Worker. The DPO/Developer is responsible for verifying all logical links and PII mappings.\"",
85
85
  "",
86
+ "# ===================================================================================",
87
+ "# HOW TO READ THIS MANIFEST:",
88
+ "# - 'targets': The list of tables the worker will delete/redact from.",
89
+ "# - 'parent': The table that owns this data. The worker deletes parent-first or child-first depending on the DB constraints.",
90
+ "# - 'join': The SQL condition used to link the child table to the parent table.",
91
+ "# - 'pii_columns': Columns identified as containing Personal Identifiable Information.",
92
+ "# - 'action': 'redact' (anonymizes the row but keeps it) or implicitly 'delete' (removes the row entirely).",
93
+ "#",
94
+ "# IMPORTANT:",
95
+ "# 1. Review all 'join' conditions. If the Introspector guessed a join for an orphaned table, verify it.",
96
+ "# 2. Review all 'pii_columns' to ensure no sensitive columns were missed.",
97
+ "# 3. Replace 'PENDING_REVIEW' in legal_attestation once verified.",
98
+ "# ===================================================================================",
99
+ "",
86
100
  "rules:",
87
101
  " - id: dpdp_standard",
88
102
  ` root_table: ${yamlScalar(formatQualifiedTable(draft.root))}`,