dpdp-erasure-cli 1.0.12 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,31 +1,38 @@
1
- # Operator CLI Reference Guide (`dpdp-erasure-cli`)
1
+ # dpdp-erasure-cli
2
2
 
3
3
  [![npm version](https://badge.fury.io/js/dpdp-erasure-cli.svg)](https://badge.fury.io/js/dpdp-erasure-cli)
4
4
 
5
- The **Operator CLI** (`dpdp-erasure-cli`) is the primary interface for data engineers, DevOps, and privacy operators to manage the DPDP Erasure Engine.
5
+ **The DPDP Erasure Engine CLI** is an automated, AI-assisted privacy toolkit that helps you securely discover, map, and cryptographically shred PII (Personally Identifiable Information) in your database.
6
6
 
7
- It is used to introspect databases, classify PII, manage compliance manifests, sign them cryptographically, and simulate safe erasure operations locally.
7
+ It acts as the control plane for the [DPDP Erasure Engine](https://github.com/devxdh/dpdp-erasure-engine), allowing Data Protection Officers (DPOs) and Software Engineers to effortlessly comply with global privacy laws (DPDP, GDPR, CCPA) without writing manual SQL deletion scripts.
8
+
9
+ ---
10
+
11
+ ## 🎯 What does it do?
12
+
13
+ Manually deleting a user across dozens of microservice tables is dangerous and prone to failure. `dpdp-erasure-cli` solves this by:
14
+
15
+ 1. **Introspection & NLP Mapping:** Safely scans your live database (using `TABLESAMPLE` block sampling) to find hidden PII in text columns, JSON blobs, and orphaned tables.
16
+ 2. **DAG Compilation:** Maps your entire Foreign Key graph to figure out the exact order tables must be deleted to avoid database constraint violations.
17
+ 3. **Drafting a Manifest:** Automatically generates a `compliance.worker.yml` erasure plan that handles HMAC redaction vs. hard deletion.
18
+ 4. **Cryptographic Signatures:** Locks the manifest using Ed25519 signatures so production deletion rules cannot be silently altered.
19
+ 5. **Dry-Run Simulations:** Tests the erasure locally in a rolled-back PostgreSQL transaction to prove it works before you deploy.
8
20
 
9
21
  ---
10
22
 
11
23
  ## 🚀 Installation
12
24
 
13
- This CLI requires [Bun](https://bun.sh/) for native cryptographic bindings and high-performance execution. Ensure you have Bun installed, then install the package globally:
25
+ This CLI relies on [Bun](https://bun.sh/) for native cryptographic bindings and high-performance execution.
14
26
 
15
27
  ```bash
16
28
  npm install -g dpdp-erasure-cli
17
29
  ```
18
30
 
19
- *Alternatively, if running from the monorepo root:*
20
- ```bash
21
- bun run --cwd ./apps/worker cli <command>
22
- ```
23
-
24
31
  ---
25
32
 
26
- ## 🛠️ Interactive Console
33
+ ## 🛠️ Interactive Setup
27
34
 
28
- If you run the CLI without any arguments, it will launch an interactive wizard to guide you through the available operations:
35
+ Don't want to memorize commands? Just run the CLI with no arguments to launch the interactive wizard:
29
36
 
30
37
  ```bash
31
38
  dpdp-cli
@@ -33,108 +40,56 @@ dpdp-cli
33
40
 
34
41
  ---
35
42
 
36
- ## 📚 Core Commands & Configuration Guide
43
+ ## 📚 Quick Start Guide
44
+
45
+ Setting up your database for privacy compliance follows this simple 5-step workflow:
37
46
 
38
- ### 1. `introspect` (The Core Command)
39
- Safely analyze your database's Foreign Key (FK) Directed Acyclic Graph (DAG) offline and draft a comprehensive PII mapping manifest (`compliance.worker.yml`). This is the first step in setting up the engine.
47
+ ### 1. Introspect Your Database
48
+ Safely analyze your schema to discover PII and draft the deletion manifest. The AI will even find logical links if you don't use strict Foreign Keys!
40
49
 
41
50
  ```bash
42
51
  dpdp-cli introspect \
43
- --url postgres://user:pass@localhost:5432/app_db \
52
+ --url "postgres://user:pass@localhost:5432/app_db" \
44
53
  --root public.users \
45
54
  --schema public \
46
- --output ./compliance.worker.yml.draft
55
+ --output ./compliance.worker.yml
47
56
  ```
48
57
 
49
- **Options:**
50
- * `-u, --url <url>`: PostgreSQL Connection DSN.
51
- * `-r, --root <table>`: The root table containing the user/subject identifier (e.g., `public.users`).
52
- * `-s, --schema <schema>`: The target PostgreSQL schema (defaults to `public`).
53
- * `-o, --output <path>`: Where to write the generated YAML draft (defaults to `compliance.worker.yml.draft`).
54
- * `-d, --max-depth <depth>`: Limit for recursive Foreign Key traversal (default: `32`).
55
- * `--sample-percent <percent>`: Percentage of data to sample using `TABLESAMPLE` for PII detection (default: `1`).
56
- * `--threshold <score>`: Confidence score required to flag a column as PII (default: `0.75`).
57
- * `--report <path>`: Write a readable Markdown report of the findings.
58
-
59
- ---
60
-
61
- ### 2. `scan` (Quick PII Check)
62
- A lightweight, metadata-only schema scan that looks for potential PII columns based purely on naming conventions, without the heavy block sampling used by `introspect`.
63
-
64
- ```bash
65
- dpdp-cli scan --url "postgres://user:pass@localhost:5432/app_db" --schema public
66
- ```
67
-
68
- ---
69
-
70
- ### 3. `keygen` (Security Provisioning)
71
- Provisions secure Ed25519 cryptographic keys required to sign your configuration manifest.
58
+ ### 2. Review and Attest
59
+ Open the generated `compliance.worker.yml`. Review the `targets` and `join` conditions. Once you are confident, sign off by updating the `legal_attestation` block.
72
60
 
61
+ ### 3. Generate Security Keys
62
+ Create a private/public keypair to securely sign your manifest for production environments.
73
63
  ```bash
74
64
  dpdp-cli keygen
75
65
  ```
76
- *This generates a private key file (e.g., `worker.pkcs8.key`) and a public key.*
77
-
78
- ---
79
-
80
- ### 4. `sign` (Cryptographic Manifest Lock)
81
- To prevent unauthorized changes to data erasure rules in production, the manifest must be cryptographically signed by a Data Protection Officer (DPO) or Lead Engineer.
82
66
 
67
+ ### 4. Cryptographically Sign the Manifest
68
+ Lock down the rules to prevent unauthorized changes in your CI/CD pipeline.
83
69
  ```bash
84
70
  dpdp-cli sign --config ./compliance.worker.yml --key ./worker.pkcs8.key
85
71
  ```
86
- *This generates a detached signature file (e.g., `compliance.worker.yml.sig`). The worker will fail to boot if this signature does not match the manifest.*
87
-
88
- ---
89
-
90
- ### 5. `check-integrity` & `verify-schema` (CI/CD Gates)
91
- These commands are designed for CI/CD pipelines to ensure the live database schema matches the legal attestation hash stored in the signed manifest.
92
72
 
73
+ ### 5. Simulate an Erasure (Dry-Run)
74
+ Test the erasure on a specific user. This command runs entirely within an isolated transaction that is automatically rolled back, so it is 100% safe.
93
75
  ```bash
94
- # Verify the compiled DAG and live schema hash
95
- dpdp-cli check-integrity --url "postgres://.../app_db" --config ./compliance.worker.yml
96
-
97
- # Check only the live schema hash against the legal attestation
98
- dpdp-cli verify-schema --url "postgres://.../app_db" --config ./compliance.worker.yml
76
+ dpdp-cli dry-run --id "user_12345" --url "postgres://user:pass@localhost:5432/app_db" --config ./compliance.worker.yml
99
77
  ```
100
- *If a developer adds a new column to the database without updating and re-signing the manifest, these commands will exit with a non-zero status.*
101
78
 
102
79
  ---
103
80
 
104
- ### 6. `dry-run` (Safe Erasure Simulation)
105
- Simulates a PII vault and redaction operation for a specific user. It runs inside an isolated transaction that is automatically rolled back.
81
+ ## 🔒 CI/CD Integrity Checks
106
82
 
107
- ```bash
108
- dpdp-cli dry-run --id "user_12345" --url "postgres://.../app_db" --config ./compliance.worker.yml
109
- ```
110
- *This is the recommended safety check to help ensure your configuration captures related PII without breaking foreign keys.*
111
-
112
- ---
113
-
114
- ### 7. `graph` (Dependency Visualization)
115
- Visualizes the recursive table dependencies (FK DAG) for a specific root table, helping you understand how data cascades down from a user.
83
+ You can use the CLI in your GitHub Actions or GitLab CI to fail builds if a developer modifies the database schema without updating the signed compliance manifest:
116
84
 
117
85
  ```bash
118
- dpdp-cli graph --table public.users --url "postgres://.../app_db"
86
+ dpdp-cli check-integrity --url "postgres://..." --config ./compliance.worker.yml
119
87
  ```
120
88
 
121
89
  ---
122
90
 
123
- ## ⚙️ Standard Workflow Example
124
-
125
- Setting up the engine generally follows this workflow:
91
+ ## 📖 Deep Dive
126
92
 
127
- 1. **Introspect** the database to generate a draft manifest:
128
- `dpdp-cli introspect -u postgres://... -r public.users -s public -o compliance.worker.yml`
129
- 2. **Review & Tweak** the `compliance.worker.yml` manually (fix false positives, add missing logical links, select masking actions like `HMAC` or `SET NULL`).
130
- 3. **Generate Keys** for signing:
131
- `dpdp-cli keygen`
132
- 4. **Sign** the finalized manifest:
133
- `dpdp-cli sign -c compliance.worker.yml -k worker.pkcs8.key`
134
- 5. **Dry-Run** an erasure to verify it behaves as expected:
135
- `dpdp-cli dry-run -i "test_user_id" -u postgres://... -c compliance.worker.yml`
136
- 6. **Deploy** the signed manifest and the detached `.sig` file to your production Worker.
137
-
138
- ---
93
+ Want to understand the cryptographic shredding architecture under the hood? Read our full documentation at the main repository:
139
94
 
140
- For architectural details and deep-dives into how the cryptographic shredding works, refer to the [Main Documentation](https://github.com/devxdh/dpdp-erasure-engine/tree/main/apps/docs).
95
+ **[DPDP Erasure Engine GitHub Repository](https://github.com/devxdh/dpdp-erasure-engine)**
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "dpdp-erasure-cli",
3
- "version": "1.0.12",
3
+ "version": "1.1.0",
4
4
  "license": "Apache-2.0",
5
5
  "keywords": [
6
6
  "dpdp",
@@ -52,9 +52,9 @@ async function hardDeleteSatelliteRows(
52
52
 
53
53
  while (true) {
54
54
  const tenantFilter = tenantId ? tx` AND tenant_id = ${tenantId}` : tx``;
55
- const deletedRows = await tx<{ id: string | number }[]>`
55
+ const deletedRows = await tx<{ ctid: string }[]>`
56
56
  WITH batch AS (
57
- SELECT id
57
+ SELECT ctid
58
58
  FROM ${tx(appSchema)}.${tx(tableName)}
59
59
  WHERE ${tx(lookupColumn)} = ${lookupValue}
60
60
  ${tenantFilter}
@@ -62,8 +62,8 @@ async function hardDeleteSatelliteRows(
62
62
  FOR UPDATE SKIP LOCKED
63
63
  )
64
64
  DELETE FROM ${tx(appSchema)}.${tx(tableName)}
65
- WHERE id IN (SELECT id FROM batch)
66
- RETURNING id
65
+ WHERE ctid IN (SELECT ctid FROM batch)
66
+ RETURNING ctid
67
67
  `;
68
68
 
69
69
  if (deletedRows.length === 0) {
@@ -3,7 +3,7 @@ import type { Tsql } from "@/types";
3
3
  import { assertIdentifier } from "@/utils";
4
4
 
5
5
  interface SatelliteRowId {
6
- id: number;
6
+ ctid: string;
7
7
  }
8
8
 
9
9
  async function yieldWorkerEventLoop(): Promise<void> {
@@ -78,7 +78,7 @@ export async function redactSatelliteTable(
78
78
  const tenantFilter = tenantId ? tx` AND tenant_id = ${tenantId}` : tx``;
79
79
  const updatedRows = await tx<SatelliteRowId[]>`
80
80
  WITH batch AS (
81
- SELECT id
81
+ SELECT ctid
82
82
  FROM ${tx(schema)}.${tx(table)}
83
83
  WHERE ${tx(safeLookupColumn)} = ${lookupValue}
84
84
  ${tenantFilter}
@@ -87,8 +87,8 @@ export async function redactSatelliteTable(
87
87
  )
88
88
  UPDATE ${tx(schema)}.${tx(table)}
89
89
  SET ${tx(safeLookupColumn)} = ${newHmacValue}
90
- WHERE id IN (SELECT id FROM batch)
91
- RETURNING id
90
+ WHERE ctid IN (SELECT ctid FROM batch)
91
+ RETURNING ctid
92
92
  `;
93
93
 
94
94
  if (updatedRows.length === 0) {
@@ -57,16 +57,24 @@ export async function runIntrospector(options: RunIntrospectorOptions): Promise<
57
57
  const sourceKey = targetKey(link.sourceTable.schema, link.sourceTable.table);
58
58
  const targetKeyStr = targetKey(link.targetTable.schema, link.targetTable.table);
59
59
 
60
+ let parentCol = link.column;
61
+ let childCol = link.column;
62
+
63
+ // Attempt intelligent primary key mapping for orphaned root links
64
+ if (link.sourceTable.schema === root.schema && link.sourceTable.table === root.table) {
65
+ parentCol = ["target_email", "user_email", "email_address"].includes(link.column) ? "email" : "id";
66
+ }
67
+
60
68
  if (dagTableKeys.has(sourceKey) && !dagTableKeys.has(targetKeyStr)) {
61
69
  dagTableKeys.add(targetKeyStr);
62
70
  logicalTargets.push({
63
71
  table: link.targetTable,
64
72
  parentTable: link.sourceTable,
65
73
  constraintName: null,
66
- childColumns: [link.column],
67
- parentColumns: [link.column],
74
+ childColumns: [childCol],
75
+ parentColumns: [parentCol],
68
76
  depth: maxDepth,
69
- fkCondition: `LOGICAL_LINK (${link.column})`,
77
+ fkCondition: `${formatQualifiedTable(link.sourceTable)}.${parentCol} = ${formatQualifiedTable(link.targetTable)}.${childCol}`,
70
78
  });
71
79
  } else if (dagTableKeys.has(targetKeyStr) && !dagTableKeys.has(sourceKey)) {
72
80
  dagTableKeys.add(sourceKey);
@@ -74,10 +82,10 @@ export async function runIntrospector(options: RunIntrospectorOptions): Promise<
74
82
  table: link.sourceTable,
75
83
  parentTable: link.targetTable,
76
84
  constraintName: null,
77
- childColumns: [link.column],
78
- parentColumns: [link.column],
85
+ childColumns: [childCol],
86
+ parentColumns: [parentCol],
79
87
  depth: maxDepth,
80
- fkCondition: `LOGICAL_LINK (${link.column})`,
88
+ fkCondition: `${formatQualifiedTable(link.targetTable)}.${parentCol} = ${formatQualifiedTable(link.sourceTable)}.${childCol}`,
81
89
  });
82
90
  }
83
91
  }
@@ -83,6 +83,20 @@ export function renderIntrospectorYaml(draft: IntrospectorDraft): string {
83
83
  "legal_disclaimer:",
84
84
  " text: \"Auto-generated by Compliance Worker. The DPO/Developer is responsible for verifying all logical links and PII mappings.\"",
85
85
  "",
86
+ "# ===================================================================================",
87
+ "# HOW TO READ THIS MANIFEST:",
88
+ "# - 'targets': The list of tables the worker will delete/redact from.",
89
+ "# - 'parent': The table that owns this data. The worker deletes parent-first or child-first depending on the DB constraints.",
90
+ "# - 'join': The SQL condition used to link the child table to the parent table.",
91
+ "# - 'pii_columns': Columns identified as containing Personal Identifiable Information.",
92
+ "# - 'action': 'redact' (anonymizes the row but keeps it) or implicitly 'delete' (removes the row entirely).",
93
+ "#",
94
+ "# IMPORTANT:",
95
+ "# 1. Review all 'join' conditions. If the Introspector guessed a join for an orphaned table, verify it.",
96
+ "# 2. Review all 'pii_columns' to ensure no sensitive columns were missed.",
97
+ "# 3. Replace 'PENDING_REVIEW' in legal_attestation once verified.",
98
+ "# ===================================================================================",
99
+ "",
86
100
  "rules:",
87
101
  " - id: dpdp_standard",
88
102
  ` root_table: ${yamlScalar(formatQualifiedTable(draft.root))}`,