dpdp-erasure-cli 1.0.12 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md
CHANGED
|
@@ -1,31 +1,38 @@
|
|
|
1
|
-
#
|
|
1
|
+
# dpdp-erasure-cli
|
|
2
2
|
|
|
3
3
|
[](https://badge.fury.io/js/dpdp-erasure-cli)
|
|
4
4
|
|
|
5
|
-
The
|
|
5
|
+
**The DPDP Erasure Engine CLI** is an automated, AI-assisted privacy toolkit that helps you securely discover, map, and cryptographically shred PII (Personally Identifiable Information) in your database.
|
|
6
6
|
|
|
7
|
-
It
|
|
7
|
+
It acts as the control plane for the [DPDP Erasure Engine](https://github.com/devxdh/dpdp-erasure-engine), allowing Data Protection Officers (DPOs) and Software Engineers to effortlessly comply with global privacy laws (DPDP, GDPR, CCPA) without writing manual SQL deletion scripts.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## 🎯 What does it do?
|
|
12
|
+
|
|
13
|
+
Manually deleting a user across dozens of microservice tables is dangerous and prone to failure. `dpdp-erasure-cli` solves this by:
|
|
14
|
+
|
|
15
|
+
1. **Introspection & NLP Mapping:** Safely scans your live database (using `TABLESAMPLE` block sampling) to find hidden PII in text columns, JSON blobs, and orphaned tables.
|
|
16
|
+
2. **DAG Compilation:** Maps your entire Foreign Key graph to figure out the exact order tables must be deleted to avoid database constraint violations.
|
|
17
|
+
3. **Drafting a Manifest:** Automatically generates a `compliance.worker.yml` erasure plan that handles HMAC redaction vs. hard deletion.
|
|
18
|
+
4. **Cryptographic Signatures:** Locks the manifest using Ed25519 signatures so production deletion rules cannot be silently altered.
|
|
19
|
+
5. **Dry-Run Simulations:** Tests the erasure locally in a rolled-back PostgreSQL transaction to prove it works before you deploy.
|
|
8
20
|
|
|
9
21
|
---
|
|
10
22
|
|
|
11
23
|
## 🚀 Installation
|
|
12
24
|
|
|
13
|
-
This CLI
|
|
25
|
+
This CLI relies on [Bun](https://bun.sh/) for native cryptographic bindings and high-performance execution.
|
|
14
26
|
|
|
15
27
|
```bash
|
|
16
28
|
npm install -g dpdp-erasure-cli
|
|
17
29
|
```
|
|
18
30
|
|
|
19
|
-
*Alternatively, if running from the monorepo root:*
|
|
20
|
-
```bash
|
|
21
|
-
bun run --cwd ./apps/worker cli <command>
|
|
22
|
-
```
|
|
23
|
-
|
|
24
31
|
---
|
|
25
32
|
|
|
26
|
-
## 🛠️ Interactive
|
|
33
|
+
## 🛠️ Interactive Setup
|
|
27
34
|
|
|
28
|
-
|
|
35
|
+
Don't want to memorize commands? Just run the CLI with no arguments to launch the interactive wizard:
|
|
29
36
|
|
|
30
37
|
```bash
|
|
31
38
|
dpdp-cli
|
|
@@ -33,108 +40,56 @@ dpdp-cli
|
|
|
33
40
|
|
|
34
41
|
---
|
|
35
42
|
|
|
36
|
-
## 📚
|
|
43
|
+
## 📚 Quick Start Guide
|
|
44
|
+
|
|
45
|
+
Setting up your database for privacy compliance follows this simple 5-step workflow:
|
|
37
46
|
|
|
38
|
-
### 1.
|
|
39
|
-
Safely analyze your
|
|
47
|
+
### 1. Introspect Your Database
|
|
48
|
+
Safely analyze your schema to discover PII and draft the deletion manifest. The AI will even find logical links if you don't use strict Foreign Keys!
|
|
40
49
|
|
|
41
50
|
```bash
|
|
42
51
|
dpdp-cli introspect \
|
|
43
|
-
--url postgres://user:pass@localhost:5432/app_db \
|
|
52
|
+
--url "postgres://user:pass@localhost:5432/app_db" \
|
|
44
53
|
--root public.users \
|
|
45
54
|
--schema public \
|
|
46
|
-
--output ./compliance.worker.yml
|
|
55
|
+
--output ./compliance.worker.yml
|
|
47
56
|
```
|
|
48
57
|
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
* `-r, --root <table>`: The root table containing the user/subject identifier (e.g., `public.users`).
|
|
52
|
-
* `-s, --schema <schema>`: The target PostgreSQL schema (defaults to `public`).
|
|
53
|
-
* `-o, --output <path>`: Where to write the generated YAML draft (defaults to `compliance.worker.yml.draft`).
|
|
54
|
-
* `-d, --max-depth <depth>`: Limit for recursive Foreign Key traversal (default: `32`).
|
|
55
|
-
* `--sample-percent <percent>`: Percentage of data to sample using `TABLESAMPLE` for PII detection (default: `1`).
|
|
56
|
-
* `--threshold <score>`: Confidence score required to flag a column as PII (default: `0.75`).
|
|
57
|
-
* `--report <path>`: Write a readable Markdown report of the findings.
|
|
58
|
-
|
|
59
|
-
---
|
|
60
|
-
|
|
61
|
-
### 2. `scan` (Quick PII Check)
|
|
62
|
-
A lightweight, metadata-only schema scan that looks for potential PII columns based purely on naming conventions, without the heavy block sampling used by `introspect`.
|
|
63
|
-
|
|
64
|
-
```bash
|
|
65
|
-
dpdp-cli scan --url "postgres://user:pass@localhost:5432/app_db" --schema public
|
|
66
|
-
```
|
|
67
|
-
|
|
68
|
-
---
|
|
69
|
-
|
|
70
|
-
### 3. `keygen` (Security Provisioning)
|
|
71
|
-
Provisions secure Ed25519 cryptographic keys required to sign your configuration manifest.
|
|
58
|
+
### 2. Review and Attest
|
|
59
|
+
Open the generated `compliance.worker.yml`. Review the `targets` and `join` conditions. Once you are confident, sign off by updating the `legal_attestation` block.
|
|
72
60
|
|
|
61
|
+
### 3. Generate Security Keys
|
|
62
|
+
Create a private/public keypair to securely sign your manifest for production environments.
|
|
73
63
|
```bash
|
|
74
64
|
dpdp-cli keygen
|
|
75
65
|
```
|
|
76
|
-
*This generates a private key file (e.g., `worker.pkcs8.key`) and a public key.*
|
|
77
|
-
|
|
78
|
-
---
|
|
79
|
-
|
|
80
|
-
### 4. `sign` (Cryptographic Manifest Lock)
|
|
81
|
-
To prevent unauthorized changes to data erasure rules in production, the manifest must be cryptographically signed by a Data Protection Officer (DPO) or Lead Engineer.
|
|
82
66
|
|
|
67
|
+
### 4. Cryptographically Sign the Manifest
|
|
68
|
+
Lock down the rules to prevent unauthorized changes in your CI/CD pipeline.
|
|
83
69
|
```bash
|
|
84
70
|
dpdp-cli sign --config ./compliance.worker.yml --key ./worker.pkcs8.key
|
|
85
71
|
```
|
|
86
|
-
*This generates a detached signature file (e.g., `compliance.worker.yml.sig`). The worker will fail to boot if this signature does not match the manifest.*
|
|
87
|
-
|
|
88
|
-
---
|
|
89
|
-
|
|
90
|
-
### 5. `check-integrity` & `verify-schema` (CI/CD Gates)
|
|
91
|
-
These commands are designed for CI/CD pipelines to ensure the live database schema matches the legal attestation hash stored in the signed manifest.
|
|
92
72
|
|
|
73
|
+
### 5. Simulate an Erasure (Dry-Run)
|
|
74
|
+
Test the erasure on a specific user. This command runs entirely within an isolated transaction that is automatically rolled back, so it is 100% safe.
|
|
93
75
|
```bash
|
|
94
|
-
|
|
95
|
-
dpdp-cli check-integrity --url "postgres://.../app_db" --config ./compliance.worker.yml
|
|
96
|
-
|
|
97
|
-
# Check only the live schema hash against the legal attestation
|
|
98
|
-
dpdp-cli verify-schema --url "postgres://.../app_db" --config ./compliance.worker.yml
|
|
76
|
+
dpdp-cli dry-run --id "user_12345" --url "postgres://user:pass@localhost:5432/app_db" --config ./compliance.worker.yml
|
|
99
77
|
```
|
|
100
|
-
*If a developer adds a new column to the database without updating and re-signing the manifest, these commands will exit with a non-zero status.*
|
|
101
78
|
|
|
102
79
|
---
|
|
103
80
|
|
|
104
|
-
|
|
105
|
-
Simulates a PII vault and redaction operation for a specific user. It runs inside an isolated transaction that is automatically rolled back.
|
|
81
|
+
## 🔒 CI/CD Integrity Checks
|
|
106
82
|
|
|
107
|
-
|
|
108
|
-
dpdp-cli dry-run --id "user_12345" --url "postgres://.../app_db" --config ./compliance.worker.yml
|
|
109
|
-
```
|
|
110
|
-
*This is the recommended safety check to help ensure your configuration captures related PII without breaking foreign keys.*
|
|
111
|
-
|
|
112
|
-
---
|
|
113
|
-
|
|
114
|
-
### 7. `graph` (Dependency Visualization)
|
|
115
|
-
Visualizes the recursive table dependencies (FK DAG) for a specific root table, helping you understand how data cascades down from a user.
|
|
83
|
+
You can use the CLI in your GitHub Actions or GitLab CI to fail builds if a developer modifies the database schema without updating the signed compliance manifest:
|
|
116
84
|
|
|
117
85
|
```bash
|
|
118
|
-
dpdp-cli
|
|
86
|
+
dpdp-cli check-integrity --url "postgres://..." --config ./compliance.worker.yml
|
|
119
87
|
```
|
|
120
88
|
|
|
121
89
|
---
|
|
122
90
|
|
|
123
|
-
##
|
|
124
|
-
|
|
125
|
-
Setting up the engine generally follows this workflow:
|
|
91
|
+
## 📖 Deep Dive
|
|
126
92
|
|
|
127
|
-
|
|
128
|
-
`dpdp-cli introspect -u postgres://... -r public.users -s public -o compliance.worker.yml`
|
|
129
|
-
2. **Review & Tweak** the `compliance.worker.yml` manually (fix false positives, add missing logical links, select masking actions like `HMAC` or `SET NULL`).
|
|
130
|
-
3. **Generate Keys** for signing:
|
|
131
|
-
`dpdp-cli keygen`
|
|
132
|
-
4. **Sign** the finalized manifest:
|
|
133
|
-
`dpdp-cli sign -c compliance.worker.yml -k worker.pkcs8.key`
|
|
134
|
-
5. **Dry-Run** an erasure to verify it behaves as expected:
|
|
135
|
-
`dpdp-cli dry-run -i "test_user_id" -u postgres://... -c compliance.worker.yml`
|
|
136
|
-
6. **Deploy** the signed manifest and the detached `.sig` file to your production Worker.
|
|
137
|
-
|
|
138
|
-
---
|
|
93
|
+
Want to understand the cryptographic shredding architecture under the hood? Read our full documentation at the main repository:
|
|
139
94
|
|
|
140
|
-
|
|
95
|
+
**[DPDP Erasure Engine GitHub Repository](https://github.com/devxdh/dpdp-erasure-engine)**
|
package/package.json
CHANGED
|
@@ -52,9 +52,9 @@ async function hardDeleteSatelliteRows(
|
|
|
52
52
|
|
|
53
53
|
while (true) {
|
|
54
54
|
const tenantFilter = tenantId ? tx` AND tenant_id = ${tenantId}` : tx``;
|
|
55
|
-
const deletedRows = await tx<{
|
|
55
|
+
const deletedRows = await tx<{ ctid: string }[]>`
|
|
56
56
|
WITH batch AS (
|
|
57
|
-
SELECT
|
|
57
|
+
SELECT ctid
|
|
58
58
|
FROM ${tx(appSchema)}.${tx(tableName)}
|
|
59
59
|
WHERE ${tx(lookupColumn)} = ${lookupValue}
|
|
60
60
|
${tenantFilter}
|
|
@@ -62,8 +62,8 @@ async function hardDeleteSatelliteRows(
|
|
|
62
62
|
FOR UPDATE SKIP LOCKED
|
|
63
63
|
)
|
|
64
64
|
DELETE FROM ${tx(appSchema)}.${tx(tableName)}
|
|
65
|
-
WHERE
|
|
66
|
-
RETURNING
|
|
65
|
+
WHERE ctid IN (SELECT ctid FROM batch)
|
|
66
|
+
RETURNING ctid
|
|
67
67
|
`;
|
|
68
68
|
|
|
69
69
|
if (deletedRows.length === 0) {
|
|
@@ -3,7 +3,7 @@ import type { Tsql } from "@/types";
|
|
|
3
3
|
import { assertIdentifier } from "@/utils";
|
|
4
4
|
|
|
5
5
|
interface SatelliteRowId {
|
|
6
|
-
|
|
6
|
+
ctid: string;
|
|
7
7
|
}
|
|
8
8
|
|
|
9
9
|
async function yieldWorkerEventLoop(): Promise<void> {
|
|
@@ -78,7 +78,7 @@ export async function redactSatelliteTable(
|
|
|
78
78
|
const tenantFilter = tenantId ? tx` AND tenant_id = ${tenantId}` : tx``;
|
|
79
79
|
const updatedRows = await tx<SatelliteRowId[]>`
|
|
80
80
|
WITH batch AS (
|
|
81
|
-
SELECT
|
|
81
|
+
SELECT ctid
|
|
82
82
|
FROM ${tx(schema)}.${tx(table)}
|
|
83
83
|
WHERE ${tx(safeLookupColumn)} = ${lookupValue}
|
|
84
84
|
${tenantFilter}
|
|
@@ -87,8 +87,8 @@ export async function redactSatelliteTable(
|
|
|
87
87
|
)
|
|
88
88
|
UPDATE ${tx(schema)}.${tx(table)}
|
|
89
89
|
SET ${tx(safeLookupColumn)} = ${newHmacValue}
|
|
90
|
-
WHERE
|
|
91
|
-
RETURNING
|
|
90
|
+
WHERE ctid IN (SELECT ctid FROM batch)
|
|
91
|
+
RETURNING ctid
|
|
92
92
|
`;
|
|
93
93
|
|
|
94
94
|
if (updatedRows.length === 0) {
|
|
@@ -57,16 +57,24 @@ export async function runIntrospector(options: RunIntrospectorOptions): Promise<
|
|
|
57
57
|
const sourceKey = targetKey(link.sourceTable.schema, link.sourceTable.table);
|
|
58
58
|
const targetKeyStr = targetKey(link.targetTable.schema, link.targetTable.table);
|
|
59
59
|
|
|
60
|
+
let parentCol = link.column;
|
|
61
|
+
let childCol = link.column;
|
|
62
|
+
|
|
63
|
+
// Attempt intelligent primary key mapping for orphaned root links
|
|
64
|
+
if (link.sourceTable.schema === root.schema && link.sourceTable.table === root.table) {
|
|
65
|
+
parentCol = ["target_email", "user_email", "email_address"].includes(link.column) ? "email" : "id";
|
|
66
|
+
}
|
|
67
|
+
|
|
60
68
|
if (dagTableKeys.has(sourceKey) && !dagTableKeys.has(targetKeyStr)) {
|
|
61
69
|
dagTableKeys.add(targetKeyStr);
|
|
62
70
|
logicalTargets.push({
|
|
63
71
|
table: link.targetTable,
|
|
64
72
|
parentTable: link.sourceTable,
|
|
65
73
|
constraintName: null,
|
|
66
|
-
childColumns: [
|
|
67
|
-
parentColumns: [
|
|
74
|
+
childColumns: [childCol],
|
|
75
|
+
parentColumns: [parentCol],
|
|
68
76
|
depth: maxDepth,
|
|
69
|
-
fkCondition:
|
|
77
|
+
fkCondition: `${formatQualifiedTable(link.sourceTable)}.${parentCol} = ${formatQualifiedTable(link.targetTable)}.${childCol}`,
|
|
70
78
|
});
|
|
71
79
|
} else if (dagTableKeys.has(targetKeyStr) && !dagTableKeys.has(sourceKey)) {
|
|
72
80
|
dagTableKeys.add(sourceKey);
|
|
@@ -74,10 +82,10 @@ export async function runIntrospector(options: RunIntrospectorOptions): Promise<
|
|
|
74
82
|
table: link.sourceTable,
|
|
75
83
|
parentTable: link.targetTable,
|
|
76
84
|
constraintName: null,
|
|
77
|
-
childColumns: [
|
|
78
|
-
parentColumns: [
|
|
85
|
+
childColumns: [childCol],
|
|
86
|
+
parentColumns: [parentCol],
|
|
79
87
|
depth: maxDepth,
|
|
80
|
-
fkCondition:
|
|
88
|
+
fkCondition: `${formatQualifiedTable(link.targetTable)}.${parentCol} = ${formatQualifiedTable(link.sourceTable)}.${childCol}`,
|
|
81
89
|
});
|
|
82
90
|
}
|
|
83
91
|
}
|
|
@@ -83,6 +83,20 @@ export function renderIntrospectorYaml(draft: IntrospectorDraft): string {
|
|
|
83
83
|
"legal_disclaimer:",
|
|
84
84
|
" text: \"Auto-generated by Compliance Worker. The DPO/Developer is responsible for verifying all logical links and PII mappings.\"",
|
|
85
85
|
"",
|
|
86
|
+
"# ===================================================================================",
|
|
87
|
+
"# HOW TO READ THIS MANIFEST:",
|
|
88
|
+
"# - 'targets': The list of tables the worker will delete/redact from.",
|
|
89
|
+
"# - 'parent': The table that owns this data. The worker deletes parent-first or child-first depending on the DB constraints.",
|
|
90
|
+
"# - 'join': The SQL condition used to link the child table to the parent table.",
|
|
91
|
+
"# - 'pii_columns': Columns identified as containing Personal Identifiable Information.",
|
|
92
|
+
"# - 'action': 'redact' (anonymizes the row but keeps it) or implicitly 'delete' (removes the row entirely).",
|
|
93
|
+
"#",
|
|
94
|
+
"# IMPORTANT:",
|
|
95
|
+
"# 1. Review all 'join' conditions. If the Introspector guessed a join for an orphaned table, verify it.",
|
|
96
|
+
"# 2. Review all 'pii_columns' to ensure no sensitive columns were missed.",
|
|
97
|
+
"# 3. Replace 'PENDING_REVIEW' in legal_attestation once verified.",
|
|
98
|
+
"# ===================================================================================",
|
|
99
|
+
"",
|
|
86
100
|
"rules:",
|
|
87
101
|
" - id: dpdp_standard",
|
|
88
102
|
` root_table: ${yamlScalar(formatQualifiedTable(draft.root))}`,
|