dpdp-erasure-cli 1.0.12 โ†’ 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,31 +1,51 @@
1
- # Operator CLI Reference Guide (`dpdp-erasure-cli`)
1
+ # dpdp-erasure-cli
2
2
 
3
- [![npm version](https://badge.fury.io/js/dpdp-erasure-cli.svg)](https://badge.fury.io/js/dpdp-erasure-cli)
3
+ [![npm version](https://img.shields.io/npm/v/dpdp-erasure-cli?color=14b8a6&style=flat-square)](https://www.npmjs.com/package/dpdp-erasure-cli)
4
4
 
5
- The **Operator CLI** (`dpdp-erasure-cli`) is the primary interface for data engineers, DevOps, and privacy operators to manage the DPDP Erasure Engine.
5
+ **The DPDP Erasure Engine CLI** is an automated, AI-assisted privacy toolkit that helps you securely discover, map, and cryptographically shred PII (Personally Identifiable Information) in your database.
6
6
 
7
- It is used to introspect databases, classify PII, manage compliance manifests, sign them cryptographically, and simulate safe erasure operations locally.
7
+ It acts as the control plane for the [DPDP Erasure Engine](https://github.com/devxdh/dpdp-erasure-engine), allowing Data Protection Officers (DPOs) and Software Engineers to effortlessly comply with global privacy laws (DPDP, GDPR, CCPA) without writing manual SQL deletion scripts.
8
+
9
+ ---
10
+
11
+ ## ๐ŸŽฏ What does it do?
12
+
13
+ Manually deleting a user across dozens of microservice tables is dangerous and prone to failure. `dpdp-erasure-cli` solves this by:
14
+
15
+ 1. **Introspection & NLP Mapping:** Safely scans your live database (using `TABLESAMPLE` block sampling) to find hidden PII in text columns, JSON blobs, and orphaned tables.
16
+ 2. **DAG Compilation:** Maps your entire Foreign Key graph to figure out the exact order tables must be deleted to avoid database constraint violations.
17
+ 3. **Drafting a Manifest:** Automatically generates a `compliance.worker.yml` erasure plan that handles HMAC redaction vs. hard deletion.
18
+ 4. **Cryptographic Signatures:** Locks the manifest using Ed25519 signatures so production deletion rules cannot be silently altered.
19
+ 5. **Dry-Run Simulations:** Tests the erasure locally in a rolled-back PostgreSQL transaction to prove it works before you deploy.
20
+
21
+ ---
22
+
23
+ ## โš ๏ธ Introspector Limitations (100% Transparency)
24
+
25
+ While our Introspector is incredibly powerful at analyzing metadata, foreign keys, and block-sampling text to find common identifiers (like Emails, Phone Numbers, Aadhaar, PAN, SSN, Credit Cards), it is fundamentally a regex and heuristic engineโ€”not a sentient AI.
26
+
27
+ **What we CANNOT do:**
28
+ 1. **Generic Column Names:** If your production database has a column named `info` or `data` and it happens to contain a user's *First Name* or *Last Name* embedded inside a generic string, our engine cannot confidently flag it as PII. We can only guess "Name" PII if the column is named descriptively (e.g., `full_name`, `first_name`, `last_name`).
29
+ 2. **Passwords, Tokens, and Secrets:** We cannot differentiate a random SHA-256 password hash or an API token from an ordinary ID string unless the column has a clear name like `password`, `secret`, `token`, or `api_key`.
30
+ 3. **Roles and Permissions:** Similarly, we cannot guess if an integer or string denotes an administrative permission unless the column gives us a hint (like `role_id` or `access_level`).
31
+
32
+ **The Solution:** The Introspector is designed to do 95% of the heavy lifting. **The remaining 5% requires a human DPO or Developer.** You must always review the generated `compliance.worker.yml` and manually add any deeply hidden sensitive columns before deploying.
8
33
 
9
34
  ---
10
35
 
11
36
  ## ๐Ÿš€ Installation
12
37
 
13
- This CLI requires [Bun](https://bun.sh/) for native cryptographic bindings and high-performance execution. Ensure you have Bun installed, then install the package globally:
38
+ This CLI relies on [Bun](https://bun.sh/) for native cryptographic bindings and high-performance execution.
14
39
 
15
40
  ```bash
16
41
  npm install -g dpdp-erasure-cli
17
42
  ```
18
43
 
19
- *Alternatively, if running from the monorepo root:*
20
- ```bash
21
- bun run --cwd ./apps/worker cli <command>
22
- ```
23
-
24
44
  ---
25
45
 
26
- ## ๐Ÿ› ๏ธ Interactive Console
46
+ ## ๐Ÿ› ๏ธ Interactive Setup
27
47
 
28
- If you run the CLI without any arguments, it will launch an interactive wizard to guide you through the available operations:
48
+ Don't want to memorize commands? Just run the CLI with no arguments to launch the interactive wizard:
29
49
 
30
50
  ```bash
31
51
  dpdp-cli
@@ -33,108 +53,56 @@ dpdp-cli
33
53
 
34
54
  ---
35
55
 
36
- ## ๐Ÿ“š Core Commands & Configuration Guide
56
+ ## ๐Ÿ“š Quick Start Guide
57
+
58
+ Setting up your database for privacy compliance follows this simple 5-step workflow:
37
59
 
38
- ### 1. `introspect` (The Core Command)
39
- Safely analyze your database's Foreign Key (FK) Directed Acyclic Graph (DAG) offline and draft a comprehensive PII mapping manifest (`compliance.worker.yml`). This is the first step in setting up the engine.
60
+ ### 1. Introspect Your Database
61
+ Safely analyze your schema to discover PII and draft the deletion manifest. The AI will even find logical links if you don't use strict Foreign Keys!
40
62
 
41
63
  ```bash
42
64
  dpdp-cli introspect \
43
- --url postgres://user:pass@localhost:5432/app_db \
65
+ --url "postgres://user:pass@localhost:5432/app_db" \
44
66
  --root public.users \
45
67
  --schema public \
46
- --output ./compliance.worker.yml.draft
68
+ --output ./compliance.worker.yml
47
69
  ```
48
70
 
49
- **Options:**
50
- * `-u, --url <url>`: PostgreSQL Connection DSN.
51
- * `-r, --root <table>`: The root table containing the user/subject identifier (e.g., `public.users`).
52
- * `-s, --schema <schema>`: The target PostgreSQL schema (defaults to `public`).
53
- * `-o, --output <path>`: Where to write the generated YAML draft (defaults to `compliance.worker.yml.draft`).
54
- * `-d, --max-depth <depth>`: Limit for recursive Foreign Key traversal (default: `32`).
55
- * `--sample-percent <percent>`: Percentage of data to sample using `TABLESAMPLE` for PII detection (default: `1`).
56
- * `--threshold <score>`: Confidence score required to flag a column as PII (default: `0.75`).
57
- * `--report <path>`: Write a readable Markdown report of the findings.
58
-
59
- ---
60
-
61
- ### 2. `scan` (Quick PII Check)
62
- A lightweight, metadata-only schema scan that looks for potential PII columns based purely on naming conventions, without the heavy block sampling used by `introspect`.
63
-
64
- ```bash
65
- dpdp-cli scan --url "postgres://user:pass@localhost:5432/app_db" --schema public
66
- ```
67
-
68
- ---
69
-
70
- ### 3. `keygen` (Security Provisioning)
71
- Provisions secure Ed25519 cryptographic keys required to sign your configuration manifest.
71
+ ### 2. Review and Attest
72
+ Open the generated `compliance.worker.yml`. Review the `targets` and `join` conditions. Once you are confident, sign off by updating the `legal_attestation` block.
72
73
 
74
+ ### 3. Generate Security Keys
75
+ Create a private/public keypair to securely sign your manifest for production environments.
73
76
  ```bash
74
77
  dpdp-cli keygen
75
78
  ```
76
- *This generates a private key file (e.g., `worker.pkcs8.key`) and a public key.*
77
-
78
- ---
79
-
80
- ### 4. `sign` (Cryptographic Manifest Lock)
81
- To prevent unauthorized changes to data erasure rules in production, the manifest must be cryptographically signed by a Data Protection Officer (DPO) or Lead Engineer.
82
79
 
80
+ ### 4. Cryptographically Sign the Manifest
81
+ Lock down the rules to prevent unauthorized changes in your CI/CD pipeline.
83
82
  ```bash
84
83
  dpdp-cli sign --config ./compliance.worker.yml --key ./worker.pkcs8.key
85
84
  ```
86
- *This generates a detached signature file (e.g., `compliance.worker.yml.sig`). The worker will fail to boot if this signature does not match the manifest.*
87
-
88
- ---
89
-
90
- ### 5. `check-integrity` & `verify-schema` (CI/CD Gates)
91
- These commands are designed for CI/CD pipelines to ensure the live database schema matches the legal attestation hash stored in the signed manifest.
92
85
 
86
+ ### 5. Simulate an Erasure (Dry-Run)
87
+ Test the erasure on a specific user. This command runs entirely within an isolated transaction that is automatically rolled back, so it is 100% safe.
93
88
  ```bash
94
- # Verify the compiled DAG and live schema hash
95
- dpdp-cli check-integrity --url "postgres://.../app_db" --config ./compliance.worker.yml
96
-
97
- # Check only the live schema hash against the legal attestation
98
- dpdp-cli verify-schema --url "postgres://.../app_db" --config ./compliance.worker.yml
89
+ dpdp-cli dry-run --id "user_12345" --url "postgres://user:pass@localhost:5432/app_db" --config ./compliance.worker.yml
99
90
  ```
100
- *If a developer adds a new column to the database without updating and re-signing the manifest, these commands will exit with a non-zero status.*
101
91
 
102
92
  ---
103
93
 
104
- ### 6. `dry-run` (Safe Erasure Simulation)
105
- Simulates a PII vault and redaction operation for a specific user. It runs inside an isolated transaction that is automatically rolled back.
94
+ ## ๐Ÿ”’ CI/CD Integrity Checks
106
95
 
107
- ```bash
108
- dpdp-cli dry-run --id "user_12345" --url "postgres://.../app_db" --config ./compliance.worker.yml
109
- ```
110
- *This is the recommended safety check to help ensure your configuration captures related PII without breaking foreign keys.*
111
-
112
- ---
113
-
114
- ### 7. `graph` (Dependency Visualization)
115
- Visualizes the recursive table dependencies (FK DAG) for a specific root table, helping you understand how data cascades down from a user.
96
+ You can use the CLI in your GitHub Actions or GitLab CI to fail builds if a developer modifies the database schema without updating the signed compliance manifest:
116
97
 
117
98
  ```bash
118
- dpdp-cli graph --table public.users --url "postgres://.../app_db"
99
+ dpdp-cli check-integrity --url "postgres://..." --config ./compliance.worker.yml
119
100
  ```
120
101
 
121
102
  ---
122
103
 
123
- ## โš™๏ธ Standard Workflow Example
124
-
125
- Setting up the engine generally follows this workflow:
104
+ ## ๐Ÿ“– Deep Dive
126
105
 
127
- 1. **Introspect** the database to generate a draft manifest:
128
- `dpdp-cli introspect -u postgres://... -r public.users -s public -o compliance.worker.yml`
129
- 2. **Review & Tweak** the `compliance.worker.yml` manually (fix false positives, add missing logical links, select masking actions like `HMAC` or `SET NULL`).
130
- 3. **Generate Keys** for signing:
131
- `dpdp-cli keygen`
132
- 4. **Sign** the finalized manifest:
133
- `dpdp-cli sign -c compliance.worker.yml -k worker.pkcs8.key`
134
- 5. **Dry-Run** an erasure to verify it behaves as expected:
135
- `dpdp-cli dry-run -i "test_user_id" -u postgres://... -c compliance.worker.yml`
136
- 6. **Deploy** the signed manifest and the detached `.sig` file to your production Worker.
137
-
138
- ---
106
+ Want to understand the cryptographic shredding architecture under the hood? Read our full documentation at the main repository:
139
107
 
140
- For architectural details and deep-dives into how the cryptographic shredding works, refer to the [Main Documentation](https://github.com/devxdh/dpdp-erasure-engine/tree/main/apps/docs).
108
+ **[DPDP Erasure Engine GitHub Repository](https://github.com/devxdh/dpdp-erasure-engine)**
@@ -1,6 +1,6 @@
1
1
  # AUTO-GENERATED BY INTROSPECTOR
2
2
  # REVIEW REQUIRED: DPO must validate every table, join condition, and PII column before production use.
3
- # Generated At: 2026-06-12T05:19:07.235Z
3
+ # Generated At: 2026-06-12T08:15:53.497Z
4
4
 
5
5
  legal_attestation:
6
6
  dpo_identifier: PENDING_REVIEW
@@ -13,6 +13,20 @@ legal_attestation:
13
13
  legal_disclaimer:
14
14
  text: "Auto-generated by Compliance Worker. The DPO/Developer is responsible for verifying all logical links and PII mappings."
15
15
 
16
+ # ===================================================================================
17
+ # HOW TO READ THIS MANIFEST:
18
+ # - 'targets': The list of tables the worker will delete/redact from.
19
+ # - 'parent': The table that owns this data. The worker deletes parent-first or child-first depending on the DB constraints.
20
+ # - 'join': The SQL condition used to link the child table to the parent table.
21
+ # - 'pii_columns': Columns identified as containing Personal Identifiable Information.
22
+ # - 'action': 'redact' (anonymizes the row but keeps it) or implicitly 'delete' (removes the row entirely).
23
+ #
24
+ # IMPORTANT:
25
+ # 1. Review all 'join' conditions. If the Introspector guessed a join for an orphaned table, verify it.
26
+ # 2. Review all 'pii_columns' to ensure no sensitive columns were missed.
27
+ # 3. Replace 'PENDING_REVIEW' in legal_attestation once verified.
28
+ # ===================================================================================
29
+
16
30
  rules:
17
31
  - id: dpdp_standard
18
32
  root_table: public.users
@@ -98,14 +112,14 @@ rules:
98
112
  message_body: HMAC
99
113
  - table: public.abandoned_carts
100
114
  parent: public.users
101
- join: "LOGICAL_LINK (customer_id)"
102
- parent_columns: [customer_id]
115
+ join: "public.users.id = public.abandoned_carts.customer_id"
116
+ parent_columns: [id]
103
117
  child_columns: [customer_id]
104
118
  pii_columns: []
105
119
  - table: public.audit_logs
106
120
  parent: public.users
107
- join: "LOGICAL_LINK (actor_id)"
108
- parent_columns: [actor_id]
121
+ join: "public.users.id = public.audit_logs.actor_id"
122
+ parent_columns: [id]
109
123
  child_columns: [actor_id]
110
124
  # Introspector Confidence: 0.820 (ipv4)
111
125
  pii_columns: [ip_address]
@@ -115,8 +129,8 @@ rules:
115
129
  ip_address: HMAC
116
130
  - table: public.legacy_crm_notes
117
131
  parent: public.users
118
- join: "LOGICAL_LINK (client_id)"
119
- parent_columns: [client_id]
132
+ join: "public.users.id = public.legacy_crm_notes.client_id"
133
+ parent_columns: [id]
120
134
  child_columns: [client_id]
121
135
  # Introspector Confidence: 0.950 (email, indian_mobile)
122
136
  pii_columns: [agent_notes]
@@ -126,8 +140,8 @@ rules:
126
140
  agent_notes: HMAC
127
141
  - table: public.marketing_campaign_clicks
128
142
  parent: public.users
129
- join: "LOGICAL_LINK (target_email)"
130
- parent_columns: [target_email]
143
+ join: "public.users.email = public.marketing_campaign_clicks.target_email"
144
+ parent_columns: [email]
131
145
  child_columns: [target_email]
132
146
  # Introspector Confidence: 0.950 (email)
133
147
  pii_columns: [target_email]
@@ -137,8 +151,8 @@ rules:
137
151
  target_email: HMAC
138
152
  - table: public.third_party_telemetry
139
153
  parent: public.users
140
- join: "LOGICAL_LINK (user_uuid)"
141
- parent_columns: [user_uuid]
154
+ join: "public.users.id = public.third_party_telemetry.user_uuid"
155
+ parent_columns: [id]
142
156
  child_columns: [user_uuid]
143
157
  pii_columns: []
144
158
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "dpdp-erasure-cli",
3
- "version": "1.0.12",
3
+ "version": "1.1.1",
4
4
  "license": "Apache-2.0",
5
5
  "keywords": [
6
6
  "dpdp",
@@ -54,4 +54,4 @@
54
54
  "postgres": "^3.4.9",
55
55
  "zod": "^4.4.2"
56
56
  }
57
- }
57
+ }
package/report.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "summary": {
3
3
  "rootTable": "public.users",
4
- "generatedAt": "2026-06-12T05:19:07.235Z",
4
+ "generatedAt": "2026-06-12T08:15:53.497Z",
5
5
  "schemaHash": "ea9e816d30fcee6bd4f322f1fb769e853c2e7ee5b19263aaa54f5ef80189212c",
6
6
  "targetCount": 15,
7
7
  "tablesWithPii": 8,
package/report.md CHANGED
@@ -3,7 +3,7 @@
3
3
  ## Summary
4
4
 
5
5
  - Root table: `public.users`
6
- - Generated at: `2026-06-12T05:19:07.235Z`
6
+ - Generated at: `2026-06-12T08:15:53.497Z`
7
7
  - Schema hash: `ea9e816d30fcee6bd4f322f1fb769e853c2e7ee5b19263aaa54f5ef80189212c`
8
8
  - DAG targets: 15
9
9
  - Tables with PII: 8
@@ -10,18 +10,18 @@ import boxen from "boxen";
10
10
  export const UI = {
11
11
  header: (title: string) => {
12
12
  console.log(
13
- boxen(pc.bold(pc.blue(`COMPLIANCE WORKER โ€” ${title.toUpperCase()}`)), {
13
+ boxen(pc.bold(pc.cyan(`COMPLIANCE WORKER โ€” ${title.toUpperCase()}`)), {
14
14
  padding: { top: 0, bottom: 0, left: 2, right: 2 },
15
15
  margin: { top: 1, bottom: 1 },
16
16
  borderStyle: "round",
17
- borderColor: "blue",
17
+ borderColor: "cyan",
18
18
  })
19
19
  );
20
20
  },
21
21
 
22
22
  divider: () => console.log(pc.gray("โ”€".repeat(process.stdout.columns || 60))),
23
23
 
24
- spinner: (text: string): Ora => ora({ text, color: "blue" }).start(),
24
+ spinner: (text: string): Ora => ora({ text, color: "cyan" }).start(),
25
25
 
26
26
  success: (msg: string) => console.log(`\n${pc.green("โœ”")} ${pc.bold(msg)}`),
27
27
  error: (msg: string) => console.error(`\n${pc.red("โœ–")} ${pc.bold(msg)}`),
@@ -52,9 +52,9 @@ async function hardDeleteSatelliteRows(
52
52
 
53
53
  while (true) {
54
54
  const tenantFilter = tenantId ? tx` AND tenant_id = ${tenantId}` : tx``;
55
- const deletedRows = await tx<{ id: string | number }[]>`
55
+ const deletedRows = await tx<{ ctid: string }[]>`
56
56
  WITH batch AS (
57
- SELECT id
57
+ SELECT ctid
58
58
  FROM ${tx(appSchema)}.${tx(tableName)}
59
59
  WHERE ${tx(lookupColumn)} = ${lookupValue}
60
60
  ${tenantFilter}
@@ -62,8 +62,8 @@ async function hardDeleteSatelliteRows(
62
62
  FOR UPDATE SKIP LOCKED
63
63
  )
64
64
  DELETE FROM ${tx(appSchema)}.${tx(tableName)}
65
- WHERE id IN (SELECT id FROM batch)
66
- RETURNING id
65
+ WHERE ctid IN (SELECT ctid FROM batch)
66
+ RETURNING ctid
67
67
  `;
68
68
 
69
69
  if (deletedRows.length === 0) {
@@ -3,7 +3,7 @@ import type { Tsql } from "@/types";
3
3
  import { assertIdentifier } from "@/utils";
4
4
 
5
5
  interface SatelliteRowId {
6
- id: number;
6
+ ctid: string;
7
7
  }
8
8
 
9
9
  async function yieldWorkerEventLoop(): Promise<void> {
@@ -78,7 +78,7 @@ export async function redactSatelliteTable(
78
78
  const tenantFilter = tenantId ? tx` AND tenant_id = ${tenantId}` : tx``;
79
79
  const updatedRows = await tx<SatelliteRowId[]>`
80
80
  WITH batch AS (
81
- SELECT id
81
+ SELECT ctid
82
82
  FROM ${tx(schema)}.${tx(table)}
83
83
  WHERE ${tx(safeLookupColumn)} = ${lookupValue}
84
84
  ${tenantFilter}
@@ -87,8 +87,8 @@ export async function redactSatelliteTable(
87
87
  )
88
88
  UPDATE ${tx(schema)}.${tx(table)}
89
89
  SET ${tx(safeLookupColumn)} = ${newHmacValue}
90
- WHERE id IN (SELECT id FROM batch)
91
- RETURNING id
90
+ WHERE ctid IN (SELECT ctid FROM batch)
91
+ RETURNING ctid
92
92
  `;
93
93
 
94
94
  if (updatedRows.length === 0) {
@@ -121,6 +121,10 @@ const CONTENT_SIGNATURES: ContentSignature[] = [
121
121
  ];
122
122
 
123
123
  const METADATA_PATTERNS: Array<{ pattern: RegExp; score: number }> = [
124
+ { pattern: /(^|_)(full_name|first_name|last_name|middle_name|surname|given_name|display_name)($|_)/i, score: STRONG_METADATA_SCORE },
125
+ { pattern: /(^|_)name($|_)/i, score: MEDIUM_METADATA_SCORE },
126
+ { pattern: /(^|_)(password|passwd|pwd|secret|token|api_key|access_token|refresh_token|auth_token|hash|salt)($|_)/i, score: STRONG_METADATA_SCORE },
127
+ { pattern: /(^|_)(role|roles|permission|permissions|group|groups|acl|access_level)($|_)/i, score: WEAK_METADATA_SCORE },
124
128
  { pattern: /(^|_)(email|e_mail|email_address|mail_address|contact_email)($|_)/i, score: STRONG_METADATA_SCORE },
125
129
  { pattern: /(^|_)(phone|mobile|msisdn|telephone|contact_number|whatsapp)(_number|_no)?($|_)/i, score: STRONG_METADATA_SCORE },
126
130
  { pattern: /(^|_)(aadhaar|aadhar|uidai)(_number|_no|_id)?($|_)/i, score: STRONG_METADATA_SCORE },
@@ -57,16 +57,24 @@ export async function runIntrospector(options: RunIntrospectorOptions): Promise<
57
57
  const sourceKey = targetKey(link.sourceTable.schema, link.sourceTable.table);
58
58
  const targetKeyStr = targetKey(link.targetTable.schema, link.targetTable.table);
59
59
 
60
+ let parentCol = link.column;
61
+ let childCol = link.column;
62
+
63
+ // Attempt intelligent primary key mapping for orphaned root links
64
+ if (link.sourceTable.schema === root.schema && link.sourceTable.table === root.table) {
65
+ parentCol = ["target_email", "user_email", "email_address"].includes(link.column) ? "email" : "id";
66
+ }
67
+
60
68
  if (dagTableKeys.has(sourceKey) && !dagTableKeys.has(targetKeyStr)) {
61
69
  dagTableKeys.add(targetKeyStr);
62
70
  logicalTargets.push({
63
71
  table: link.targetTable,
64
72
  parentTable: link.sourceTable,
65
73
  constraintName: null,
66
- childColumns: [link.column],
67
- parentColumns: [link.column],
74
+ childColumns: [childCol],
75
+ parentColumns: [parentCol],
68
76
  depth: maxDepth,
69
- fkCondition: `LOGICAL_LINK (${link.column})`,
77
+ fkCondition: `${formatQualifiedTable(link.sourceTable)}.${parentCol} = ${formatQualifiedTable(link.targetTable)}.${childCol}`,
70
78
  });
71
79
  } else if (dagTableKeys.has(targetKeyStr) && !dagTableKeys.has(sourceKey)) {
72
80
  dagTableKeys.add(sourceKey);
@@ -74,10 +82,10 @@ export async function runIntrospector(options: RunIntrospectorOptions): Promise<
74
82
  table: link.sourceTable,
75
83
  parentTable: link.targetTable,
76
84
  constraintName: null,
77
- childColumns: [link.column],
78
- parentColumns: [link.column],
85
+ childColumns: [childCol],
86
+ parentColumns: [parentCol],
79
87
  depth: maxDepth,
80
- fkCondition: `LOGICAL_LINK (${link.column})`,
88
+ fkCondition: `${formatQualifiedTable(link.targetTable)}.${parentCol} = ${formatQualifiedTable(link.sourceTable)}.${childCol}`,
81
89
  });
82
90
  }
83
91
  }
@@ -83,6 +83,20 @@ export function renderIntrospectorYaml(draft: IntrospectorDraft): string {
83
83
  "legal_disclaimer:",
84
84
  " text: \"Auto-generated by Compliance Worker. The DPO/Developer is responsible for verifying all logical links and PII mappings.\"",
85
85
  "",
86
+ "# ===================================================================================",
87
+ "# HOW TO READ THIS MANIFEST:",
88
+ "# - 'targets': The list of tables the worker will delete/redact from.",
89
+ "# - 'parent': The table that owns this data. The worker deletes parent-first or child-first depending on the DB constraints.",
90
+ "# - 'join': The SQL condition used to link the child table to the parent table.",
91
+ "# - 'pii_columns': Columns identified as containing Personal Identifiable Information.",
92
+ "# - 'action': 'redact' (anonymizes the row but keeps it) or implicitly 'delete' (removes the row entirely).",
93
+ "#",
94
+ "# IMPORTANT:",
95
+ "# 1. Review all 'join' conditions. If the Introspector guessed a join for an orphaned table, verify it.",
96
+ "# 2. Review all 'pii_columns' to ensure no sensitive columns were missed.",
97
+ "# 3. Replace 'PENDING_REVIEW' in legal_attestation once verified.",
98
+ "# ===================================================================================",
99
+ "",
86
100
  "rules:",
87
101
  " - id: dpdp_standard",
88
102
  ` root_table: ${yamlScalar(formatQualifiedTable(draft.root))}`,
@@ -53,10 +53,10 @@ describe("Introspector PII classifier", () => {
53
53
  expect(classifyLeaf("560001", "postal_code")).toContain("indian_pin_code");
54
54
  });
55
55
 
56
- it("does not infer personal names without a dedicated NER model", () => {
57
- expect(metadataScore("full_name")).toBe(0);
58
- expect(metadataScore("first_name")).toBe(0);
59
- expect(metadataScore("customer_name")).toBe(0);
56
+ it("infers personal names based on standard developer metadata patterns", () => {
57
+ expect(metadataScore("full_name")).toBe(0.92);
58
+ expect(metadataScore("first_name")).toBe(0.92);
59
+ expect(metadataScore("customer_name")).toBe(0.82);
60
60
  expect(classifyLeaf("Priya Sharma")).toEqual([]);
61
61
  });
62
62
  });
@@ -201,6 +201,7 @@ describe("Offline Introspector", () => {
201
201
  expect(users?.piiColumns.map((column) => column.column).sort()).toEqual([
202
202
  "card_number",
203
203
  "email",
204
+ "full_name",
204
205
  "gstin",
205
206
  "phone",
206
207
  "upi_id",
@@ -215,7 +216,6 @@ describe("Offline Introspector", () => {
215
216
  expect(yaml).toContain(`root_table: ${schema}.users`);
216
217
  expect(yaml).toContain(`table: ${schema}.profiles`);
217
218
  expect(yaml).toContain("pii_columns: [pan, aadhaar_payload, nested_payload]");
218
- expect(yaml).not.toContain("full_name");
219
219
  expect(yaml).toContain("schema_hash:");
220
220
  expect(yaml).toContain("generated_by: compliance-introspector-v1");
221
221
  expect(yaml).toContain("legal_disclaimer:");