data_redactor 0.7.2 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 24e5a8c5434fa45e6465033d7a2792a94c0b8f0d9f3ff3ca49a3a6c3c6e4065b
4
- data.tar.gz: dbe07535163dd1926ce11495aee8b8e2476eb0750a7b028c7af6a7428c6f39c5
3
+ metadata.gz: aae84ce43ab8d2ad6751ade655397480057d687bdff7a6b857ee2821dffeb91b
4
+ data.tar.gz: 6e01ebe9d76e64ac3a93c31f14f7089ddaff4645e0819dec675d7585e37c4078
5
5
  SHA512:
6
- metadata.gz: af2e9c869e54e7694eec166586f4baa3b16121c02f10f160db1a422ef9f6c87237ebab3c210bd950a4ab60551e3a6b32e8ccb0d2cfd695b90343cbbd5f8bd6a0
7
- data.tar.gz: 9ac096890d738968525e50d3b0d5517f92886899176991ee3ee49a218398b5b47e470ff46b4c41fa458bc3a265f0069aaef2b4010e1296708e2470b7d8638b83
6
+ metadata.gz: a80b34b6e35fdf97cca2d9fecf1cb136b0e0c676ca1e3080c3680aaeb41b442cb2400c371e38417703910fc023ad01a3cf61f2f4a8f8dc5d4bd681174420d2b4
7
+ data.tar.gz: 4dbf049d027385c21a721044ac6651f4b24b33500af98a0c4c88d7860c06eb2721ea5aab4b8793f6fcd5614cac0cc645c1f67e73b5ebf8d7ff04ead58b67a244
data/CHANGELOG.md CHANGED
@@ -7,6 +7,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ### Added
11
+ - `DataRedactor.redact_deep(data, only:, except:, placeholder:)` — recursively redacts every String value in a nested Hash/Array structure. Non-string scalars (Integer, Float, nil, Boolean) and Hash keys are passed through unchanged. Returns a deep copy; never mutates the input. Raises `ArgumentError` on circular references.
12
+ - `DataRedactor.redact_json(json_string, only:, except:, placeholder:)` — parses JSON, redacts via `redact_deep`, and returns valid JSON. Raises `JSON::ParserError` on invalid input.
13
+ - HashiCorp Vault service tokens (`hvs.` prefix, 90–120 chars) — pattern `hashicorp_vault_service_token`
14
+ - HashiCorp Vault batch tokens (`hvb.` prefix, 138–300 chars) — pattern `hashicorp_vault_batch_token`
15
+ - HashiCorp Terraform Cloud API tokens (`<14-char-id>.atlasv1.<token>`) — pattern `hashicorp_terraform_api_token`
16
+
17
+ All three HashiCorp patterns are tagged `:credentials` and do not require word-boundary wrapping (distinctive prefixes eliminate false positives).
18
+
10
19
  ## [0.7.2] - 2026-05-09
11
20
 
12
21
  **Supersedes 0.7.1, which has been yanked from RubyGems.**
@@ -56,67 +56,70 @@ const int boundary_wrapped[NUM_PATTERNS] = {
56
56
  0, /* 26: Scaleway Access Key */
57
57
  0, /* 27: PEM private key header (generic) */
58
58
  0, /* 28: GPG Private Key Block */
59
+ 0, /* 29: HashiCorp Vault Service Token (hvs.) */
60
+ 0, /* 30: HashiCorp Vault Batch Token (hvb.) */
61
+ 0, /* 31: HashiCorp Terraform Cloud API Token (atlasv1) */
59
62
  /* ---- Tier 3: IBANs (longest → shortest) ---- */
60
- 0, /* 29: Hungary IBAN (28 chars) */
61
- 0, /* 30: Poland IBAN (28 chars) */
62
- 0, /* 31: France IBAN (27 chars) */
63
- 0, /* 32: Italy IBAN (27 chars) */
64
- 0, /* 33: Portugal IBAN (25 chars) */
65
- 0, /* 34: Spain IBAN (24 chars) */
66
- 0, /* 35: Czechia IBAN (24 chars) */
67
- 0, /* 36: Romania IBAN (24 chars) */
68
- 0, /* 37: Sweden IBAN (24 chars) */
69
- 0, /* 38: Germany IBAN (22 chars) */
70
- 0, /* 39: Ireland IBAN (22 chars) */
71
- 0, /* 40: Switzerland IBAN (21 chars) */
72
- 0, /* 41: Austria IBAN (20 chars) */
73
- 0, /* 42: Netherlands IBAN (18 chars) */
74
- 0, /* 43: Denmark IBAN (18 chars) */
75
- 0, /* 44: Finland IBAN (18 chars) */
76
- 0, /* 45: Belgium IBAN (16 chars) */
77
- 0, /* 46: Norway IBAN (15 chars) */
63
+ 0, /* 32: Hungary IBAN (28 chars) */
64
+ 0, /* 33: Poland IBAN (28 chars) */
65
+ 0, /* 34: France IBAN (27 chars) */
66
+ 0, /* 35: Italy IBAN (27 chars) */
67
+ 0, /* 36: Portugal IBAN (25 chars) */
68
+ 0, /* 37: Spain IBAN (24 chars) */
69
+ 0, /* 38: Czechia IBAN (24 chars) */
70
+ 0, /* 39: Romania IBAN (24 chars) */
71
+ 0, /* 40: Sweden IBAN (24 chars) */
72
+ 0, /* 41: Germany IBAN (22 chars) */
73
+ 0, /* 42: Ireland IBAN (22 chars) */
74
+ 0, /* 43: Switzerland IBAN (21 chars) */
75
+ 0, /* 44: Austria IBAN (20 chars) */
76
+ 0, /* 45: Netherlands IBAN (18 chars) */
77
+ 0, /* 46: Denmark IBAN (18 chars) */
78
+ 0, /* 47: Finland IBAN (18 chars) */
79
+ 0, /* 48: Belgium IBAN (16 chars) */
80
+ 0, /* 49: Norway IBAN (15 chars) */
78
81
  /* ---- Tier 4: Structured formats (dots, dashes, slashes, @) ---- */
79
- 0, /* 47: Email Address */
80
- 0, /* 48: International Phone Number */
81
- 0, /* 49: Brazilian CNPJ (XX.XXX.XXX/XXXX-XX) */
82
- 0, /* 50: Brazilian CPF (XXX.XXX.XXX-XX) */
83
- 0, /* 51: UUID v4 */
84
- 0, /* 52: IPv4 address */
85
- 0, /* 53: Credit card numbers */
86
- 0, /* 54: Indian Aadhaar (XXXX XXXX XXXX) */
82
+ 0, /* 50: Email Address */
83
+ 0, /* 51: International Phone Number */
84
+ 0, /* 52: Brazilian CNPJ (XX.XXX.XXX/XXXX-XX) */
85
+ 0, /* 53: Brazilian CPF (XXX.XXX.XXX-XX) */
86
+ 0, /* 54: UUID v4 */
87
+ 0, /* 55: IPv4 address */
88
+ 0, /* 56: Credit card numbers */
89
+ 0, /* 57: Indian Aadhaar (XXXX XXXX XXXX) */
87
90
  /* ---- Tier 5: Letter-anchored patterns ---- */
88
- 0, /* 55: Mexican CURP (18 alphanum, distinctive structure) */
89
- 0, /* 56: Italian CF with omocodia (16 chars) */
90
- 0, /* 57: Italian CF basic (16 chars) */
91
- 0, /* 58: UK National Insurance Number */
92
- 0, /* 59: Spanish NIE (X/Y/Z prefix) */
93
- 0, /* 60: Passport letter prefix + digits */
91
+ 0, /* 58: Mexican CURP (18 alphanum, distinctive structure) */
92
+ 0, /* 59: Italian CF with omocodia (16 chars) */
93
+ 0, /* 60: Italian CF basic (16 chars) */
94
+ 0, /* 61: UK National Insurance Number */
95
+ 0, /* 62: Spanish NIE (X/Y/Z prefix) */
96
+ 0, /* 63: Passport letter prefix + digits */
94
97
  /* ---- Tier 6: Boundary-wrapped structured (dash/dot/slash separated) ---- */
95
- 1, /* 61: South Korean RRN (YYMMDD-XXXXXXX, 14 chars) */
96
- 1, /* 62: Swiss AHV Number (756.XXXX.XXXX.XX) */
97
- 1, /* 63: Finnish HETU (DDMMYY[+-A]XXXC) */
98
- 1, /* 64: Swedish Personnummer (YYMMDD[-+]XXXX) */
99
- 1, /* 65: Danish CPR Number (DDMMYY-XXXX) */
100
- 1, /* 66: Czech Rodné číslo (YYMMDD/XXXX) */
101
- 1, /* 67: US Social Security Number (XXX-XX-XXXX) */
102
- 1, /* 68: US ITIN (9XX-XX-XXXX) */
103
- 1, /* 69: Canadian SIN (XXX-XXX-XXX) */
104
- 1, /* 70: Australian TFN (XXX-XXX-XXX) */
105
- 1, /* 71: Indian PAN (AAAAA0000A) */
106
- 1, /* 72: Spanish DNI (8 digits + letter) */
107
- 1, /* 73: Hungarian Tax ID (8XXXXXXXXX, 10 digits) */
98
+ 1, /* 64: South Korean RRN (YYMMDD-XXXXXXX, 14 chars) */
99
+ 1, /* 65: Swiss AHV Number (756.XXXX.XXXX.XX) */
100
+ 1, /* 66: Finnish HETU (DDMMYY[+-A]XXXC) */
101
+ 1, /* 67: Swedish Personnummer (YYMMDD[-+]XXXX) */
102
+ 1, /* 68: Danish CPR Number (DDMMYY-XXXX) */
103
+ 1, /* 69: Czech Rodné číslo (YYMMDD/XXXX) */
104
+ 1, /* 70: US Social Security Number (XXX-XX-XXXX) */
105
+ 1, /* 71: US ITIN (9XX-XX-XXXX) */
106
+ 1, /* 72: Canadian SIN (XXX-XXX-XXX) */
107
+ 1, /* 73: Australian TFN (XXX-XXX-XXX) */
108
+ 1, /* 74: Indian PAN (AAAAA0000A) */
109
+ 1, /* 75: Spanish DNI (8 digits + letter) */
110
+ 1, /* 76: Hungarian Tax ID (8XXXXXXXXX, 10 digits) */
108
111
  /* ---- Tier 7: Boundary-wrapped pure digits (longest → shortest) ---- */
109
- 1, /* 74: French NIR (15 digits) */
110
- 1, /* 75: South African ID (13 digits) */
111
- 1, /* 76: Romanian CNP (13 digits) */
112
- 1, /* 77: Japanese My Number (12 digits) */
113
- 1, /* 78: Polish PESEL (11 digits) */
114
- 1, /* 79: Belgian National Number (11 digits) */
115
- 1, /* 80: Norwegian Fødselsnummer (11 digits) */
116
- 1, /* 81: Passport 9 digits */
117
- 1, /* 82: Dutch BSN (8-9 digits) */
118
- 1, /* 83: Austrian Abgabenkontonummer (9 digits) */
119
- 1 /* 84: Polish PESEL duplicate */
112
+ 1, /* 77: French NIR (15 digits) */
113
+ 1, /* 78: South African ID (13 digits) */
114
+ 1, /* 79: Romanian CNP (13 digits) */
115
+ 1, /* 80: Japanese My Number (12 digits) */
116
+ 1, /* 81: Polish PESEL (11 digits) */
117
+ 1, /* 82: Belgian National Number (11 digits) */
118
+ 1, /* 83: Norwegian Fødselsnummer (11 digits) */
119
+ 1, /* 84: Passport 9 digits */
120
+ 1, /* 85: Dutch BSN (8-9 digits) */
121
+ 1, /* 86: Austrian Abgabenkontonummer (9 digits) */
122
+ 1 /* 87: Polish PESEL duplicate */
120
123
  };
121
124
 
122
125
  /*
@@ -124,56 +127,57 @@ const int boundary_wrapped[NUM_PATTERNS] = {
124
127
  * patterns run when the caller passes a mask (only/except).
125
128
  */
126
129
  const int pattern_tags[NUM_PATTERNS] = {
127
- /* 0-28: secrets, API keys, tokens, private keys, webhooks */
130
+ /* 0-31: secrets, API keys, tokens, private keys, webhooks */
128
131
  TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
129
132
  TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
130
133
  TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
131
134
  TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
132
135
  TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
133
136
  TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
134
- /* 29-46: IBANs */
137
+ TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
138
+ /* 32-49: IBANs */
135
139
  TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL,
136
140
  TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL,
137
141
  TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL,
138
142
  TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL,
139
- TAG_CONTACT, /* 47: email */
140
- TAG_CONTACT, /* 48: phone */
141
- TAG_TAX_ID, /* 49: Brazilian CNPJ */
142
- TAG_TAX_ID, /* 50: Brazilian CPF */
143
- TAG_OTHER, /* 51: UUID v4 */
144
- TAG_NETWORK, /* 52: IPv4 */
145
- TAG_FINANCIAL, /* 53: credit card */
146
- TAG_NATIONAL_ID, /* 54: Indian Aadhaar */
147
- TAG_NATIONAL_ID, /* 55: Mexican CURP */
148
- TAG_TAX_ID, /* 56: Italian CF (omocodia) */
149
- TAG_TAX_ID, /* 57: Italian CF (basic) */
150
- TAG_NATIONAL_ID, /* 58: UK NIN */
151
- TAG_NATIONAL_ID, /* 59: Spanish NIE */
152
- TAG_TRAVEL, /* 60: passport letter prefix */
153
- TAG_NATIONAL_ID, /* 61: Korean RRN */
154
- TAG_NATIONAL_ID, /* 62: Swiss AHV */
155
- TAG_NATIONAL_ID, /* 63: Finnish HETU */
156
- TAG_NATIONAL_ID, /* 64: Swedish Personnummer */
157
- TAG_NATIONAL_ID, /* 65: Danish CPR */
158
- TAG_NATIONAL_ID, /* 66: Czech Rodné číslo */
159
- TAG_NATIONAL_ID, /* 67: US SSN */
160
- TAG_TAX_ID, /* 68: US ITIN */
161
- TAG_NATIONAL_ID, /* 69: Canadian SIN */
162
- TAG_TAX_ID, /* 70: Australian TFN */
163
- TAG_TAX_ID, /* 71: Indian PAN */
164
- TAG_NATIONAL_ID, /* 72: Spanish DNI */
165
- TAG_TAX_ID, /* 73: Hungarian Tax ID */
166
- TAG_NATIONAL_ID, /* 74: French NIR */
167
- TAG_NATIONAL_ID, /* 75: South African ID */
168
- TAG_NATIONAL_ID, /* 76: Romanian CNP */
169
- TAG_TAX_ID, /* 77: Japanese My Number */
170
- TAG_NATIONAL_ID, /* 78: Polish PESEL */
171
- TAG_NATIONAL_ID, /* 79: Belgian National Number */
172
- TAG_NATIONAL_ID, /* 80: Norwegian Fødselsnummer */
173
- TAG_TRAVEL, /* 81: passport 9 digits */
174
- TAG_NATIONAL_ID, /* 82: Dutch BSN */
175
- TAG_TAX_ID, /* 83: Austrian Abgabenkontonummer */
176
- TAG_NATIONAL_ID /* 84: Polish PESEL duplicate */
143
+ TAG_CONTACT, /* 50: email */
144
+ TAG_CONTACT, /* 51: phone */
145
+ TAG_TAX_ID, /* 52: Brazilian CNPJ */
146
+ TAG_TAX_ID, /* 53: Brazilian CPF */
147
+ TAG_OTHER, /* 54: UUID v4 */
148
+ TAG_NETWORK, /* 55: IPv4 */
149
+ TAG_FINANCIAL, /* 56: credit card */
150
+ TAG_NATIONAL_ID, /* 57: Indian Aadhaar */
151
+ TAG_NATIONAL_ID, /* 58: Mexican CURP */
152
+ TAG_TAX_ID, /* 59: Italian CF (omocodia) */
153
+ TAG_TAX_ID, /* 60: Italian CF (basic) */
154
+ TAG_NATIONAL_ID, /* 61: UK NIN */
155
+ TAG_NATIONAL_ID, /* 62: Spanish NIE */
156
+ TAG_TRAVEL, /* 63: passport letter prefix */
157
+ TAG_NATIONAL_ID, /* 64: Korean RRN */
158
+ TAG_NATIONAL_ID, /* 65: Swiss AHV */
159
+ TAG_NATIONAL_ID, /* 66: Finnish HETU */
160
+ TAG_NATIONAL_ID, /* 67: Swedish Personnummer */
161
+ TAG_NATIONAL_ID, /* 68: Danish CPR */
162
+ TAG_NATIONAL_ID, /* 69: Czech Rodné číslo */
163
+ TAG_NATIONAL_ID, /* 70: US SSN */
164
+ TAG_TAX_ID, /* 71: US ITIN */
165
+ TAG_NATIONAL_ID, /* 72: Canadian SIN */
166
+ TAG_TAX_ID, /* 73: Australian TFN */
167
+ TAG_TAX_ID, /* 74: Indian PAN */
168
+ TAG_NATIONAL_ID, /* 75: Spanish DNI */
169
+ TAG_TAX_ID, /* 76: Hungarian Tax ID */
170
+ TAG_NATIONAL_ID, /* 77: French NIR */
171
+ TAG_NATIONAL_ID, /* 78: South African ID */
172
+ TAG_NATIONAL_ID, /* 79: Romanian CNP */
173
+ TAG_TAX_ID, /* 80: Japanese My Number */
174
+ TAG_NATIONAL_ID, /* 81: Polish PESEL */
175
+ TAG_NATIONAL_ID, /* 82: Belgian National Number */
176
+ TAG_NATIONAL_ID, /* 83: Norwegian Fødselsnummer */
177
+ TAG_TRAVEL, /* 84: passport 9 digits */
178
+ TAG_NATIONAL_ID, /* 85: Dutch BSN */
179
+ TAG_TAX_ID, /* 86: Austrian Abgabenkontonummer */
180
+ TAG_NATIONAL_ID /* 87: Polish PESEL duplicate */
177
181
  };
178
182
 
179
183
  const char *pattern_names[NUM_PATTERNS] = {
@@ -206,62 +210,65 @@ const char *pattern_names[NUM_PATTERNS] = {
206
210
  "scaleway_access_key", /* 26 */
207
211
  "pem_private_key", /* 27 */
208
212
  "gpg_private_key", /* 28 */
209
- "iban_hu", /* 29 */
210
- "iban_pl", /* 30 */
211
- "iban_fr", /* 31 */
212
- "iban_it", /* 32 */
213
- "iban_pt", /* 33 */
214
- "iban_es", /* 34 */
215
- "iban_cz", /* 35 */
216
- "iban_ro", /* 36 */
217
- "iban_se", /* 37 */
218
- "iban_de", /* 38 */
219
- "iban_ie", /* 39 */
220
- "iban_ch", /* 40 */
221
- "iban_at", /* 41 */
222
- "iban_nl", /* 42 */
223
- "iban_dk", /* 43 */
224
- "iban_fi", /* 44 */
225
- "iban_be", /* 45 */
226
- "iban_no", /* 46 */
227
- "email", /* 47 */
228
- "phone_e164", /* 48 */
229
- "brazilian_cnpj", /* 49 */
230
- "brazilian_cpf", /* 50 */
231
- "uuid_v4", /* 51 */
232
- "ipv4", /* 52 */
233
- "credit_card", /* 53 */
234
- "indian_aadhaar", /* 54 */
235
- "mexican_curp", /* 55 */
236
- "italian_cf_omocodia", /* 56 */
237
- "italian_cf", /* 57 */
238
- "uk_nin", /* 58 */
239
- "spanish_nie", /* 59 */
240
- "passport_letter_prefix", /* 60 */
241
- "korean_rrn", /* 61 */
242
- "swiss_ahv", /* 62 */
243
- "finnish_hetu", /* 63 */
244
- "swedish_personnummer", /* 64 */
245
- "danish_cpr", /* 65 */
246
- "czech_rodne_cislo", /* 66 */
247
- "us_ssn", /* 67 */
248
- "us_itin", /* 68 */
249
- "canadian_sin", /* 69 */
250
- "australian_tfn", /* 70 */
251
- "indian_pan", /* 71 */
252
- "spanish_dni", /* 72 */
253
- "hungarian_tax_id", /* 73 */
254
- "french_nir", /* 74 */
255
- "south_african_id", /* 75 */
256
- "romanian_cnp", /* 76 */
257
- "japanese_my_number", /* 77 */
258
- "polish_pesel", /* 78 */
259
- "belgian_national_number", /* 79 */
260
- "norwegian_fodselsnummer", /* 80 */
261
- "passport_9digits", /* 81 */
262
- "dutch_bsn", /* 82 */
263
- "austrian_abgabenkontonummer", /* 83 */
264
- "polish_pesel_2" /* 84 */
213
+ "hashicorp_vault_service_token", /* 29 */
214
+ "hashicorp_vault_batch_token", /* 30 */
215
+ "hashicorp_terraform_api_token", /* 31 */
216
+ "iban_hu", /* 32 */
217
+ "iban_pl", /* 33 */
218
+ "iban_fr", /* 34 */
219
+ "iban_it", /* 35 */
220
+ "iban_pt", /* 36 */
221
+ "iban_es", /* 37 */
222
+ "iban_cz", /* 38 */
223
+ "iban_ro", /* 39 */
224
+ "iban_se", /* 40 */
225
+ "iban_de", /* 41 */
226
+ "iban_ie", /* 42 */
227
+ "iban_ch", /* 43 */
228
+ "iban_at", /* 44 */
229
+ "iban_nl", /* 45 */
230
+ "iban_dk", /* 46 */
231
+ "iban_fi", /* 47 */
232
+ "iban_be", /* 48 */
233
+ "iban_no", /* 49 */
234
+ "email", /* 50 */
235
+ "phone_e164", /* 51 */
236
+ "brazilian_cnpj", /* 52 */
237
+ "brazilian_cpf", /* 53 */
238
+ "uuid_v4", /* 54 */
239
+ "ipv4", /* 55 */
240
+ "credit_card", /* 56 */
241
+ "indian_aadhaar", /* 57 */
242
+ "mexican_curp", /* 58 */
243
+ "italian_cf_omocodia", /* 59 */
244
+ "italian_cf", /* 60 */
245
+ "uk_nin", /* 61 */
246
+ "spanish_nie", /* 62 */
247
+ "passport_letter_prefix", /* 63 */
248
+ "korean_rrn", /* 64 */
249
+ "swiss_ahv", /* 65 */
250
+ "finnish_hetu", /* 66 */
251
+ "swedish_personnummer", /* 67 */
252
+ "danish_cpr", /* 68 */
253
+ "czech_rodne_cislo", /* 69 */
254
+ "us_ssn", /* 70 */
255
+ "us_itin", /* 71 */
256
+ "canadian_sin", /* 72 */
257
+ "australian_tfn", /* 73 */
258
+ "indian_pan", /* 74 */
259
+ "spanish_dni", /* 75 */
260
+ "hungarian_tax_id", /* 76 */
261
+ "french_nir", /* 77 */
262
+ "south_african_id", /* 78 */
263
+ "romanian_cnp", /* 79 */
264
+ "japanese_my_number", /* 80 */
265
+ "polish_pesel", /* 81 */
266
+ "belgian_national_number", /* 82 */
267
+ "norwegian_fodselsnummer", /* 83 */
268
+ "passport_9digits", /* 84 */
269
+ "dutch_bsn", /* 85 */
270
+ "austrian_abgabenkontonummer", /* 86 */
271
+ "polish_pesel_2" /* 87 */
265
272
  };
266
273
 
267
274
  /*
@@ -330,126 +337,132 @@ const char *pattern_strings[NUM_PATTERNS] = {
330
337
  "-----BEGIN [A-Z ]*PRIVATE KEY-----",
331
338
  /* 28: GPG Private Key Block */
332
339
  "-----BEGIN PGP PRIVATE KEY BLOCK-----",
340
+ /* 29: HashiCorp Vault Service Token (hvs. + 90-120 base64url chars) */
341
+ "hvs\\.[A-Za-z0-9_-]{90,120}",
342
+ /* 30: HashiCorp Vault Batch Token (hvb. + 138-300 base64url chars) */
343
+ "hvb\\.[A-Za-z0-9_-]{138,300}",
344
+ /* 31: HashiCorp Terraform Cloud API Token (14 alphanum + .atlasv1. + 60-70 base64url chars) */
345
+ "[A-Za-z0-9]{14}\\.atlasv1\\.[A-Za-z0-9_=-]{60,70}",
333
346
 
334
347
  /* ---- Tier 3: IBANs (longest → shortest) ---- */
335
- /* 29: Hungary IBAN (HU, 28 chars) */
348
+ /* 32: Hungary IBAN (HU, 28 chars) */
336
349
  "HU[0-9]{2}[0-9]{24}",
337
- /* 30: Poland IBAN (PL, 28 chars) */
350
+ /* 33: Poland IBAN (PL, 28 chars) */
338
351
  "PL[0-9]{2}[0-9]{24}",
339
- /* 31: France IBAN (FR, 27 chars) */
352
+ /* 34: France IBAN (FR, 27 chars) */
340
353
  "FR[0-9]{2}[0-9]{10}[A-Z0-9]{11}[0-9]{2}",
341
- /* 32: Italy IBAN (IT, 27 chars) */
354
+ /* 35: Italy IBAN (IT, 27 chars) */
342
355
  "IT[0-9]{2}[A-Z][0-9]{10}[A-Z0-9]{12}",
343
- /* 33: Portugal IBAN (PT, 25 chars) */
356
+ /* 36: Portugal IBAN (PT, 25 chars) */
344
357
  "PT[0-9]{2}[0-9]{21}",
345
- /* 34: Spain IBAN (ES, 24 chars) */
358
+ /* 37: Spain IBAN (ES, 24 chars) */
346
359
  "ES[0-9]{2}[0-9]{20}",
347
- /* 35: Czechia IBAN (CZ, 24 chars) */
360
+ /* 38: Czechia IBAN (CZ, 24 chars) */
348
361
  "CZ[0-9]{2}[0-9]{20}",
349
- /* 36: Romania IBAN (RO, 24 chars) */
362
+ /* 39: Romania IBAN (RO, 24 chars) */
350
363
  "RO[0-9]{2}[A-Z]{4}[A-Z0-9]{16}",
351
- /* 37: Sweden IBAN (SE, 24 chars) */
364
+ /* 40: Sweden IBAN (SE, 24 chars) */
352
365
  "SE[0-9]{2}[0-9]{20}",
353
- /* 38: Germany IBAN (DE, 22 chars) */
366
+ /* 41: Germany IBAN (DE, 22 chars) */
354
367
  "DE[0-9]{2}[0-9]{18}",
355
- /* 39: Ireland IBAN (IE, 22 chars) */
368
+ /* 42: Ireland IBAN (IE, 22 chars) */
356
369
  "IE[0-9]{2}[A-Z]{4}[0-9]{14}",
357
- /* 40: Switzerland IBAN (CH, 21 chars) */
370
+ /* 43: Switzerland IBAN (CH, 21 chars) */
358
371
  "CH[0-9]{2}[0-9]{5}[A-Z0-9]{12}",
359
- /* 41: Austria IBAN (AT, 20 chars) */
372
+ /* 44: Austria IBAN (AT, 20 chars) */
360
373
  "AT[0-9]{2}[0-9]{16}",
361
- /* 42: Netherlands IBAN (NL, 18 chars) */
374
+ /* 45: Netherlands IBAN (NL, 18 chars) */
362
375
  "NL[0-9]{2}[A-Z]{4}[0-9]{10}",
363
- /* 43: Denmark IBAN (DK, 18 chars) */
376
+ /* 46: Denmark IBAN (DK, 18 chars) */
364
377
  "DK[0-9]{2}[0-9]{14}",
365
- /* 44: Finland IBAN (FI, 18 chars) */
378
+ /* 47: Finland IBAN (FI, 18 chars) */
366
379
  "FI[0-9]{2}[0-9]{14}",
367
- /* 45: Belgium IBAN (BE, 16 chars) */
380
+ /* 48: Belgium IBAN (BE, 16 chars) */
368
381
  "BE[0-9]{2}[0-9]{12}",
369
- /* 46: Norway IBAN (NO, 15 chars) */
382
+ /* 49: Norway IBAN (NO, 15 chars) */
370
383
  "NO[0-9]{2}[0-9]{11}",
371
384
 
372
385
  /* ---- Tier 4: Structured formats (dots, dashes, slashes, @) ---- */
373
- /* 47: Email Address */
386
+ /* 50: Email Address */
374
387
  "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}",
375
- /* 48: International Phone Number (E.164) */
388
+ /* 51: International Phone Number (E.164) */
376
389
  "\\+[0-9]{1,3}[- ]?[0-9][0-9 -]{6,13}[0-9]",
377
- /* 49: Brazilian CNPJ (XX.XXX.XXX/XXXX-XX) */
390
+ /* 52: Brazilian CNPJ (XX.XXX.XXX/XXXX-XX) */
378
391
  "[0-9]{2}\\.[0-9]{3}\\.[0-9]{3}/[0-9]{4}-[0-9]{2}",
379
- /* 50: Brazilian CPF (XXX.XXX.XXX-XX) */
392
+ /* 53: Brazilian CPF (XXX.XXX.XXX-XX) */
380
393
  "[0-9]{3}\\.[0-9]{3}\\.[0-9]{3}-[0-9]{2}",
381
- /* 51: UUID v4 / Scaleway Secret Key */
394
+ /* 54: UUID v4 / Scaleway Secret Key */
382
395
  "[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}",
383
- /* 52: IPv4 address */
396
+ /* 55: IPv4 address */
384
397
  "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)",
385
- /* 53: Credit card numbers (Visa, Mastercard, Amex, Discover, JCB) */
398
+ /* 56: Credit card numbers (Visa, Mastercard, Amex, Discover, JCB) */
386
399
  "(4[0-9]{15}|4[0-9]{12}|5[1-5][0-9]{14}|6011[0-9]{12}|65[0-9]{14}|3[47][0-9]{13}|3[068][0-9]{11}|35[0-9]{14})",
387
- /* 54: Indian Aadhaar (XXXX XXXX XXXX or XXXX-XXXX-XXXX) */
400
+ /* 57: Indian Aadhaar (XXXX XXXX XXXX or XXXX-XXXX-XXXX) */
388
401
  "[0-9]{4}[- ][0-9]{4}[- ][0-9]{4}",
389
402
 
390
403
  /* ---- Tier 5: Letter-anchored patterns ---- */
391
- /* 55: Mexican CURP (18 alphanum, distinctive structure) */
404
+ /* 58: Mexican CURP (18 alphanum, distinctive structure) */
392
405
  "[A-Z]{4}[0-9]{6}[HM][A-Z]{5}[A-Z0-9][0-9]",
393
- /* 56: Italian CF with omocodia (16 chars) */
406
+ /* 59: Italian CF with omocodia (16 chars) */
394
407
  "[A-Z]{6}[0-9LMNPQRSTUV]{2}[ABCDEHLMPRST][0-9LMNPQRSTUV]{2}[A-Z][0-9LMNPQRSTUV]{3}[A-Z]",
395
- /* 57: Italian CF basic (16 chars) */
408
+ /* 60: Italian CF basic (16 chars) */
396
409
  "[A-Z]{6}[0-9]{2}[A-Z][0-9]{2}[A-Z][0-9]{3}[A-Z]",
397
- /* 58: UK National Insurance Number (AA 99 99 99 A-D) */
410
+ /* 61: UK National Insurance Number (AA 99 99 99 A-D) */
398
411
  "[A-Z]{2} ?[0-9]{2} ?[0-9]{2} ?[0-9]{2} ?[A-D]",
399
- /* 59: Spanish NIE (X/Y/Z + 7 digits + letter) */
412
+ /* 62: Spanish NIE (X/Y/Z + 7 digits + letter) */
400
413
  "[XYZ][0-9]{7}[A-Z]",
401
- /* 60: Passport - letter prefix + digits (e.g. AB1234567) */
414
+ /* 63: Passport - letter prefix + digits (e.g. AB1234567) */
402
415
  "[A-Z]{1,2}[0-9]{6,7}",
403
416
 
404
417
  /* ---- Tier 6: Boundary-wrapped structured (dash/dot/slash separated) ---- */
405
- /* 61: South Korean RRN (YYMMDD-XXXXXXX, 14 chars with dash) */
418
+ /* 64: South Korean RRN (YYMMDD-XXXXXXX, 14 chars with dash) */
406
419
  "[0-9]{6}-[0-9]{7}",
407
- /* 62: Swiss AHV Number (756.XXXX.XXXX.XX) */
420
+ /* 65: Swiss AHV Number (756.XXXX.XXXX.XX) */
408
421
  "756\\.[0-9]{4}\\.[0-9]{4}\\.[0-9]{2}",
409
- /* 63: Finnish HETU (DDMMYY[+-A]XXXC) */
422
+ /* 66: Finnish HETU (DDMMYY[+-A]XXXC) */
410
423
  "[0-9]{6}[-+A][0-9]{3}[0-9A-Y]",
411
- /* 64: Swedish Personnummer (YYMMDD[-+]XXXX) */
424
+ /* 67: Swedish Personnummer (YYMMDD[-+]XXXX) */
412
425
  "[0-9]{6}[-+][0-9]{4}",
413
- /* 65: Danish CPR Number (DDMMYY-XXXX) */
426
+ /* 68: Danish CPR Number (DDMMYY-XXXX) */
414
427
  "[0-9]{6}-[0-9]{4}",
415
- /* 66: Czech Rodné číslo (YYMMDD/XXXX or YYMMDDXXXX) */
428
+ /* 69: Czech Rodné číslo (YYMMDD/XXXX or YYMMDDXXXX) */
416
429
  "[0-9]{6}/?[0-9]{3,4}",
417
- /* 67: US Social Security Number (XXX-XX-XXXX) */
430
+ /* 70: US Social Security Number (XXX-XX-XXXX) */
418
431
  "[0-9]{3}-[0-9]{2}-[0-9]{4}",
419
- /* 68: US ITIN (9XX-XX-XXXX) */
432
+ /* 71: US ITIN (9XX-XX-XXXX) */
420
433
  "9[0-9]{2}-[0-9]{2}-[0-9]{4}",
421
- /* 69: Canadian SIN (XXX-XXX-XXX) */
434
+ /* 72: Canadian SIN (XXX-XXX-XXX) */
422
435
  "[0-9]{3}-[0-9]{3}-[0-9]{3}",
423
- /* 70: Australian TFN (XXX-XXX-XXX or XXX XXX XXX) */
436
+ /* 73: Australian TFN (XXX-XXX-XXX or XXX XXX XXX) */
424
437
  "[0-9]{3}[- ][0-9]{3}[- ][0-9]{3}",
425
- /* 71: Indian PAN (5 letters + 4 digits + 1 letter) */
438
+ /* 74: Indian PAN (5 letters + 4 digits + 1 letter) */
426
439
  "[A-Z]{5}[0-9]{4}[A-Z]",
427
- /* 72: Spanish DNI (8 digits + 1 letter) */
440
+ /* 75: Spanish DNI (8 digits + 1 letter) */
428
441
  "[0-9]{8}[A-Z]",
429
- /* 73: Hungarian Tax ID (starts with 8, 10 digits) */
442
+ /* 76: Hungarian Tax ID (starts with 8, 10 digits) */
430
443
  "8[0-9]{9}",
431
444
 
432
445
  /* ---- Tier 7: Boundary-wrapped pure digits (longest → shortest) ---- */
433
- /* 74: French NIR / Social Security (15 digits) */
446
+ /* 77: French NIR / Social Security (15 digits) */
434
447
  "[12][0-9]{2}[01][0-9][0-9]{2}[0-9]{3}[0-9]{3}[0-9]{2}",
435
- /* 75: South African ID (13 digits) */
448
+ /* 78: South African ID (13 digits) */
436
449
  "[0-9]{13}",
437
- /* 76: Romanian CNP (13 digits, first digit 1-8) */
450
+ /* 79: Romanian CNP (13 digits, first digit 1-8) */
438
451
  "[1-8][0-9]{12}",
439
- /* 77: Japanese My Number (12 digits) */
452
+ /* 80: Japanese My Number (12 digits) */
440
453
  "[0-9]{12}",
441
- /* 78: Polish PESEL (11 digits) */
454
+ /* 81: Polish PESEL (11 digits) */
442
455
  "[0-9]{11}",
443
- /* 79: Belgian National Number (11 digits) */
456
+ /* 82: Belgian National Number (11 digits) */
444
457
  "[0-9]{11}",
445
- /* 80: Norwegian Fødselsnummer (11 digits) */
458
+ /* 83: Norwegian Fødselsnummer (11 digits) */
446
459
  "[0-9]{11}",
447
- /* 81: Passport - 9 consecutive digits */
460
+ /* 84: Passport - 9 consecutive digits */
448
461
  "[0-9]{9}",
449
- /* 82: Dutch BSN (8-9 digits) */
462
+ /* 85: Dutch BSN (8-9 digits) */
450
463
  "[0-9]{8,9}",
451
- /* 83: Austrian Abgabenkontonummer (9 digits) */
464
+ /* 86: Austrian Abgabenkontonummer (9 digits) */
452
465
  "[0-9]{9}",
453
- /* 84: Polish PESEL duplicate */
466
+ /* 87: Polish PESEL duplicate */
454
467
  "[0-9]{11}"
455
468
  };
@@ -3,7 +3,7 @@
3
3
 
4
4
  #include <regex.h>
5
5
 
6
- #define NUM_PATTERNS 85
6
+ #define NUM_PATTERNS 88
7
7
 
8
8
  extern const char *pattern_strings[NUM_PATTERNS];
9
9
  extern const int boundary_wrapped[NUM_PATTERNS];
@@ -1,4 +1,4 @@
1
1
  module DataRedactor
2
2
  # Current gem version. Follows {https://semver.org Semantic Versioning 2.0.0}.
3
- VERSION = "0.7.2"
3
+ VERSION = "0.8.0"
4
4
  end
data/lib/data_redactor.rb CHANGED
@@ -1,4 +1,5 @@
1
1
  require "set"
2
+ require "json"
2
3
  require_relative "data_redactor/version"
3
4
  require_relative "data_redactor/data_redactor" # loads the compiled .so
4
5
 
@@ -161,6 +162,54 @@ module DataRedactor
161
162
  result
162
163
  end
163
164
 
165
+ # Recursively redact every String value in a nested Hash/Array structure.
166
+ #
167
+ # Walks the structure depth-first. Only String leaves are passed through
168
+ # {redact}; all other leaf types (Integer, Float, nil, Symbol, Boolean)
169
+ # are copied unchanged. Hash keys are never modified.
170
+ #
171
+ # Returns a deep copy — the original structure is never mutated.
172
+ #
173
+ # @param data [Hash, Array, String, Object] the structure to walk.
174
+ # Any type is accepted; non-String scalars are returned as-is.
175
+ # @param only [Symbol, String, Array, nil] forwarded to {redact}.
176
+ # @param except [Symbol, String, Array, nil] forwarded to {redact}.
177
+ # @param placeholder [String, :tagged, :hash] forwarded to {redact}.
178
+ # @return [Hash, Array, String, Object] a new structure of the same shape
179
+ # with all String leaves redacted.
180
+ # @raise [ArgumentError] if the structure contains a circular reference.
181
+ #
182
+ # @example Rails params
183
+ # safe = DataRedactor.redact_deep(params.to_h)
184
+ #
185
+ # @example Mixed filter
186
+ # DataRedactor.redact_deep(payload, only: :credentials, placeholder: :tagged)
187
+ def redact_deep(data, only: nil, except: nil, placeholder: PLACEHOLDER_DEFAULT)
188
+ _walk(data, only: only, except: except, placeholder: placeholder, seen: Set.new)
189
+ end
190
+
191
+ # Parse +json_string+, redact every String value in the resulting structure,
192
+ # and return valid JSON.
193
+ #
194
+ # Delegates traversal to {redact_deep}. All keyword arguments are forwarded
195
+ # to {redact}.
196
+ #
197
+ # @param json_string [String] valid JSON input.
198
+ # @param only [Symbol, String, Array, nil] forwarded to {redact}.
199
+ # @param except [Symbol, String, Array, nil] forwarded to {redact}.
200
+ # @param placeholder [String, :tagged, :hash] forwarded to {redact}.
201
+ # @return [String] a JSON string with all String values redacted.
202
+ # @raise [JSON::ParserError] if +json_string+ is not valid JSON.
203
+ #
204
+ # @example
205
+ # DataRedactor.redact_json('{"email":"alice@example.com","count":3}')
206
+ # # => '{"email":"[REDACTED]","count":3}'
207
+ def redact_json(json_string, only: nil, except: nil, placeholder: PLACEHOLDER_DEFAULT)
208
+ parsed = JSON.parse(json_string)
209
+ redacted = redact_deep(parsed, only: only, except: except, placeholder: placeholder)
210
+ JSON.generate(redacted)
211
+ end
212
+
164
213
  # Register a custom redaction pattern.
165
214
  #
166
215
  # Patterns must be valid POSIX ERE. Ruby-only syntax (+\d+, +\s+, +\w+,
@@ -317,6 +366,31 @@ module DataRedactor
317
366
  bits
318
367
  end
319
368
 
369
+ # @api private
370
+ # Depth-first recursive walker for {redact_deep}.
371
+ # +seen+ is a Set of object_ids already on the current traversal stack,
372
+ # used to detect circular references.
373
+ def _walk(node, only:, except:, placeholder:, seen:)
374
+ case node
375
+ when String
376
+ redact(node, only: only, except: except, placeholder: placeholder)
377
+ when Hash
378
+ raise ArgumentError, "redact_deep: circular reference detected" if seen.include?(node.object_id)
379
+ seen.add(node.object_id)
380
+ result = node.transform_values { |v| _walk(v, only: only, except: except, placeholder: placeholder, seen: seen) }
381
+ seen.delete(node.object_id)
382
+ result
383
+ when Array
384
+ raise ArgumentError, "redact_deep: circular reference detected" if seen.include?(node.object_id)
385
+ seen.add(node.object_id)
386
+ result = node.map { |v| _walk(v, only: only, except: except, placeholder: placeholder, seen: seen) }
387
+ seen.delete(node.object_id)
388
+ result
389
+ else
390
+ node
391
+ end
392
+ end
393
+
320
394
  # @api private
321
395
  def pattern_enabled?(name, tag_bit, only_present, only_bits, only_names,
322
396
  except_bits, except_names)
data/readme.md CHANGED
@@ -103,6 +103,36 @@ DataRedactor.scan(text, except: :network)
103
103
  DataRedactor.scan(text, only: :contact, except: ["email"])
104
104
  ```
105
105
 
106
+ ### Hash / JSON traversal
107
+
108
+ Redact every string value inside a nested Hash or Array — useful for params hashes, Sidekiq job payloads, webhook bodies, and anything that isn't a flat string:
109
+
110
+ ```ruby
111
+ # Hash — returns a deep copy, never mutates the input
112
+ result = DataRedactor.redact_deep({
113
+ "user" => { "email" => "alice@example.com" },
114
+ "count" => 3,
115
+ "tags" => ["admin", "alice@example.com"]
116
+ })
117
+ # => { "user" => { "email" => "[REDACTED]" }, "count" => 3, "tags" => ["admin", "[REDACTED]"] }
118
+
119
+ # Hash keys are never touched — only values are redacted
120
+ # Non-string scalars (Integer, Float, nil, Boolean) pass through unchanged
121
+
122
+ # Accepts the same filters as redact
123
+ DataRedactor.redact_deep(params, only: :credentials)
124
+ DataRedactor.redact_deep(payload, except: :network, placeholder: :tagged)
125
+ ```
126
+
127
+ ```ruby
128
+ # JSON string — parse → redact_deep → re-serialise
129
+ safe_json = DataRedactor.redact_json('{"email":"alice@example.com","count":3}')
130
+ # => '{"email":"[REDACTED]","count":3}'
131
+
132
+ # Raises JSON::ParserError on invalid input
133
+ DataRedactor.redact_json("not json") # => JSON::ParserError
134
+ ```
135
+
106
136
  ### Custom patterns
107
137
 
108
138
  Teams often have internal IDs that the gem can't ship. Register them at boot:
@@ -179,7 +209,7 @@ Pass an empty subset (e.g. `scrub: [:headers]`) to opt out of body wrapping. For
179
209
 
180
210
  > **Body wrapping is buffering.** The middleware reads the entire response body into memory before scanning. For streaming endpoints (SSE, large file downloads, Rack::Hijack) use `scrub: [:headers]` and rely on the Logger formatter for application logs instead.
181
211
 
182
- ## Detected patterns (85 total)
212
+ ## Detected patterns (88 total)
183
213
 
184
214
  The table below is a representative sample. Use `DataRedactor.pattern_names` for the canonical, machine-readable list — it stays in sync with the C extension automatically.
185
215
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data_redactor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.2
4
+ version: 0.8.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Daniele Frisanco