data_redactor 0.7.0 → 0.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +67 -1
- data/ext/data_redactor/patterns.c +221 -208
- data/ext/data_redactor/patterns.h +1 -1
- data/lib/data_redactor/version.rb +1 -1
- data/lib/data_redactor.rb +74 -0
- data/readme.md +75 -5
- metadata +17 -6
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: aae84ce43ab8d2ad6751ade655397480057d687bdff7a6b857ee2821dffeb91b
|
|
4
|
+
data.tar.gz: 6e01ebe9d76e64ac3a93c31f14f7089ddaff4645e0819dec675d7585e37c4078
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: a80b34b6e35fdf97cca2d9fecf1cb136b0e0c676ca1e3080c3680aaeb41b442cb2400c371e38417703910fc023ad01a3cf61f2f4a8f8dc5d4bd681174420d2b4
|
|
7
|
+
data.tar.gz: 4dbf049d027385c21a721044ac6651f4b24b33500af98a0c4c88d7860c06eb2721ea5aab4b8793f6fcd5614cac0cc645c1f67e73b5ebf8d7ff04ead58b67a244
|
data/CHANGELOG.md
CHANGED
|
@@ -7,6 +7,70 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
### Added
|
|
11
|
+
- `DataRedactor.redact_deep(data, only:, except:, placeholder:)` — recursively redacts every String value in a nested Hash/Array structure. Non-string scalars (Integer, Float, nil, Boolean) and Hash keys are passed through unchanged. Returns a deep copy; never mutates the input. Raises `ArgumentError` on circular references.
|
|
12
|
+
- `DataRedactor.redact_json(json_string, only:, except:, placeholder:)` — parses JSON, redacts via `redact_deep`, and returns valid JSON. Raises `JSON::ParserError` on invalid input.
|
|
13
|
+
- HashiCorp Vault service tokens (`hvs.` prefix, 90–120 chars) — pattern `hashicorp_vault_service_token`
|
|
14
|
+
- HashiCorp Vault batch tokens (`hvb.` prefix, 138–300 chars) — pattern `hashicorp_vault_batch_token`
|
|
15
|
+
- HashiCorp Terraform Cloud API tokens (`<14-char-id>.atlasv1.<token>`) — pattern `hashicorp_terraform_api_token`
|
|
16
|
+
|
|
17
|
+
All three HashiCorp patterns are tagged `:credentials` and do not require word-boundary wrapping (distinctive prefixes eliminate false positives).
|
|
18
|
+
|
|
19
|
+
## [0.7.2] - 2026-05-09
|
|
20
|
+
|
|
21
|
+
**Supersedes 0.7.1, which has been yanked from RubyGems.**
|
|
22
|
+
|
|
23
|
+
0.7.1 had a release pipeline bug: the source gem and the precompiled native
|
|
24
|
+
gems were published by two independent workflows, with no gating between
|
|
25
|
+
them. When the native-binary builds failed (`oxidize-rb/actions/cross-gem`
|
|
26
|
+
couldn't pull `rbsys/aarch64-linux:0.9.128` from Docker Hub), the source
|
|
27
|
+
gem still published — leaving users with release notes that promised
|
|
28
|
+
precompiled binaries that didn't exist on RubyGems. 0.7.2 ships the same
|
|
29
|
+
features as 0.7.1 plus the pipeline fix.
|
|
30
|
+
|
|
31
|
+
### Changed
|
|
32
|
+
- **Atomic release pipeline.** Source-gem publishing moved out of `ci.yml`
|
|
33
|
+
and into `release-binaries.yml`, alongside the native-gem builds. The
|
|
34
|
+
publish job now `needs: [build-source, build-native]`; if any native
|
|
35
|
+
platform fails to build, **nothing publishes**. This guarantees the
|
|
36
|
+
RubyGems release matches what the GitHub release notes promise.
|
|
37
|
+
- **Direct `rake-compiler-dock` invocation in CI** instead of the
|
|
38
|
+
`oxidize-rb/actions/cross-gem` action. Same code path as `rake gem:all`
|
|
39
|
+
locally and the existing PR-time smoke test in `ci.yml`. Uses
|
|
40
|
+
`ghcr.io/rake-compiler/*` images (no Docker Hub rate limits).
|
|
41
|
+
|
|
42
|
+
### Fixed
|
|
43
|
+
- All 6 precompiled native gems now actually publish on release — the
|
|
44
|
+
`aarch64-linux` variant in particular was previously failing.
|
|
45
|
+
|
|
46
|
+
### Documentation
|
|
47
|
+
- README installation section rewritten around the user's question
|
|
48
|
+
("what changes for me?"). Adds explicit Docker / Alpine guidance and a
|
|
49
|
+
heads-up about `bundle lock --add-platform` for cross-platform deploys.
|
|
50
|
+
|
|
51
|
+
## [0.7.1] - 2026-05-09 [YANKED]
|
|
52
|
+
|
|
53
|
+
### Added
|
|
54
|
+
- **Precompiled native gems** for the most common platforms — installing
|
|
55
|
+
`data_redactor` no longer requires a C toolchain on these targets:
|
|
56
|
+
- `x86_64-linux`, `aarch64-linux` (glibc)
|
|
57
|
+
- `x86_64-linux-musl`, `aarch64-linux-musl` (Alpine)
|
|
58
|
+
- `x86_64-darwin`, `arm64-darwin` (macOS Intel + Apple Silicon)
|
|
59
|
+
Each native gem ships compiled `.so` files for Ruby 3.1, 3.2, 3.3, and 3.4.
|
|
60
|
+
Bundler/RubyGems automatically picks the right gem for the host; users on
|
|
61
|
+
any other platform fall back to the source gem and compile as before.
|
|
62
|
+
- `rake gem:all` task — builds every native gem locally via `rake-compiler-dock`
|
|
63
|
+
(requires Docker). Single command to regenerate the full release matrix.
|
|
64
|
+
- `.github/workflows/release-binaries.yml` — builds & publishes all native
|
|
65
|
+
gems on every GitHub release. Also exposes `workflow_dispatch` so a
|
|
66
|
+
maintainer can rebuild any past release without cutting a new tag.
|
|
67
|
+
|
|
68
|
+
### Changed
|
|
69
|
+
- CI test matrix now includes Ruby 3.4 in addition to 3.1, 3.2, 3.3.
|
|
70
|
+
- Gemspec: added `rake-compiler-dock` as a development dependency. Source-only
|
|
71
|
+
gem size is unchanged — native gems strip `ext/` and the `extconf.rb`
|
|
72
|
+
extension hook so they only carry the prebuilt `.so` files.
|
|
73
|
+
|
|
10
74
|
## [0.7.0] - 2026-05-08
|
|
11
75
|
|
|
12
76
|
### Added
|
|
@@ -106,7 +170,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
106
170
|
- `DataRedactor.redact(text)` module function returning the input with every match replaced by `[REDACTED]`.
|
|
107
171
|
- RSpec suite with one example per pattern.
|
|
108
172
|
|
|
109
|
-
[Unreleased]: https://github.com/danielefrisanco/data_redactor/compare/v0.7.
|
|
173
|
+
[Unreleased]: https://github.com/danielefrisanco/data_redactor/compare/v0.7.2...HEAD
|
|
174
|
+
[0.7.2]: https://github.com/danielefrisanco/data_redactor/compare/v0.7.1...v0.7.2
|
|
175
|
+
[0.7.1]: https://github.com/danielefrisanco/data_redactor/compare/v0.7.0...v0.7.1
|
|
110
176
|
[0.7.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.6.1...v0.7.0
|
|
111
177
|
[0.6.1]: https://github.com/danielefrisanco/data_redactor/compare/v0.6.0...v0.6.1
|
|
112
178
|
[0.6.0]: https://github.com/danielefrisanco/data_redactor/compare/v0.5.0...v0.6.0
|
|
@@ -56,67 +56,70 @@ const int boundary_wrapped[NUM_PATTERNS] = {
|
|
|
56
56
|
0, /* 26: Scaleway Access Key */
|
|
57
57
|
0, /* 27: PEM private key header (generic) */
|
|
58
58
|
0, /* 28: GPG Private Key Block */
|
|
59
|
+
0, /* 29: HashiCorp Vault Service Token (hvs.) */
|
|
60
|
+
0, /* 30: HashiCorp Vault Batch Token (hvb.) */
|
|
61
|
+
0, /* 31: HashiCorp Terraform Cloud API Token (atlasv1) */
|
|
59
62
|
/* ---- Tier 3: IBANs (longest → shortest) ---- */
|
|
60
|
-
0, /*
|
|
61
|
-
0, /*
|
|
62
|
-
0, /*
|
|
63
|
-
0, /*
|
|
64
|
-
0, /*
|
|
65
|
-
0, /*
|
|
66
|
-
0, /*
|
|
67
|
-
0, /*
|
|
68
|
-
0, /*
|
|
69
|
-
0, /*
|
|
70
|
-
0, /*
|
|
71
|
-
0, /*
|
|
72
|
-
0, /*
|
|
73
|
-
0, /*
|
|
74
|
-
0, /*
|
|
75
|
-
0, /*
|
|
76
|
-
0, /*
|
|
77
|
-
0, /*
|
|
63
|
+
0, /* 32: Hungary IBAN (28 chars) */
|
|
64
|
+
0, /* 33: Poland IBAN (28 chars) */
|
|
65
|
+
0, /* 34: France IBAN (27 chars) */
|
|
66
|
+
0, /* 35: Italy IBAN (27 chars) */
|
|
67
|
+
0, /* 36: Portugal IBAN (25 chars) */
|
|
68
|
+
0, /* 37: Spain IBAN (24 chars) */
|
|
69
|
+
0, /* 38: Czechia IBAN (24 chars) */
|
|
70
|
+
0, /* 39: Romania IBAN (24 chars) */
|
|
71
|
+
0, /* 40: Sweden IBAN (24 chars) */
|
|
72
|
+
0, /* 41: Germany IBAN (22 chars) */
|
|
73
|
+
0, /* 42: Ireland IBAN (22 chars) */
|
|
74
|
+
0, /* 43: Switzerland IBAN (21 chars) */
|
|
75
|
+
0, /* 44: Austria IBAN (20 chars) */
|
|
76
|
+
0, /* 45: Netherlands IBAN (18 chars) */
|
|
77
|
+
0, /* 46: Denmark IBAN (18 chars) */
|
|
78
|
+
0, /* 47: Finland IBAN (18 chars) */
|
|
79
|
+
0, /* 48: Belgium IBAN (16 chars) */
|
|
80
|
+
0, /* 49: Norway IBAN (15 chars) */
|
|
78
81
|
/* ---- Tier 4: Structured formats (dots, dashes, slashes, @) ---- */
|
|
79
|
-
0, /*
|
|
80
|
-
0, /*
|
|
81
|
-
0, /*
|
|
82
|
-
0, /*
|
|
83
|
-
0, /*
|
|
84
|
-
0, /*
|
|
85
|
-
0, /*
|
|
86
|
-
0, /*
|
|
82
|
+
0, /* 50: Email Address */
|
|
83
|
+
0, /* 51: International Phone Number */
|
|
84
|
+
0, /* 52: Brazilian CNPJ (XX.XXX.XXX/XXXX-XX) */
|
|
85
|
+
0, /* 53: Brazilian CPF (XXX.XXX.XXX-XX) */
|
|
86
|
+
0, /* 54: UUID v4 */
|
|
87
|
+
0, /* 55: IPv4 address */
|
|
88
|
+
0, /* 56: Credit card numbers */
|
|
89
|
+
0, /* 57: Indian Aadhaar (XXXX XXXX XXXX) */
|
|
87
90
|
/* ---- Tier 5: Letter-anchored patterns ---- */
|
|
88
|
-
0, /*
|
|
89
|
-
0, /*
|
|
90
|
-
0, /*
|
|
91
|
-
0, /*
|
|
92
|
-
0, /*
|
|
93
|
-
0, /*
|
|
91
|
+
0, /* 58: Mexican CURP (18 alphanum, distinctive structure) */
|
|
92
|
+
0, /* 59: Italian CF with omocodia (16 chars) */
|
|
93
|
+
0, /* 60: Italian CF basic (16 chars) */
|
|
94
|
+
0, /* 61: UK National Insurance Number */
|
|
95
|
+
0, /* 62: Spanish NIE (X/Y/Z prefix) */
|
|
96
|
+
0, /* 63: Passport letter prefix + digits */
|
|
94
97
|
/* ---- Tier 6: Boundary-wrapped structured (dash/dot/slash separated) ---- */
|
|
95
|
-
1, /*
|
|
96
|
-
1, /*
|
|
97
|
-
1, /*
|
|
98
|
-
1, /*
|
|
99
|
-
1, /*
|
|
100
|
-
1, /*
|
|
101
|
-
1, /*
|
|
102
|
-
1, /*
|
|
103
|
-
1, /*
|
|
104
|
-
1, /*
|
|
105
|
-
1, /*
|
|
106
|
-
1, /*
|
|
107
|
-
1, /*
|
|
98
|
+
1, /* 64: South Korean RRN (YYMMDD-XXXXXXX, 14 chars) */
|
|
99
|
+
1, /* 65: Swiss AHV Number (756.XXXX.XXXX.XX) */
|
|
100
|
+
1, /* 66: Finnish HETU (DDMMYY[+-A]XXXC) */
|
|
101
|
+
1, /* 67: Swedish Personnummer (YYMMDD[-+]XXXX) */
|
|
102
|
+
1, /* 68: Danish CPR Number (DDMMYY-XXXX) */
|
|
103
|
+
1, /* 69: Czech Rodné číslo (YYMMDD/XXXX) */
|
|
104
|
+
1, /* 70: US Social Security Number (XXX-XX-XXXX) */
|
|
105
|
+
1, /* 71: US ITIN (9XX-XX-XXXX) */
|
|
106
|
+
1, /* 72: Canadian SIN (XXX-XXX-XXX) */
|
|
107
|
+
1, /* 73: Australian TFN (XXX-XXX-XXX) */
|
|
108
|
+
1, /* 74: Indian PAN (AAAAA0000A) */
|
|
109
|
+
1, /* 75: Spanish DNI (8 digits + letter) */
|
|
110
|
+
1, /* 76: Hungarian Tax ID (8XXXXXXXXX, 10 digits) */
|
|
108
111
|
/* ---- Tier 7: Boundary-wrapped pure digits (longest → shortest) ---- */
|
|
109
|
-
1, /*
|
|
110
|
-
1, /*
|
|
111
|
-
1, /*
|
|
112
|
-
1, /*
|
|
113
|
-
1, /*
|
|
114
|
-
1, /*
|
|
115
|
-
1, /*
|
|
116
|
-
1, /*
|
|
117
|
-
1, /*
|
|
118
|
-
1, /*
|
|
119
|
-
1 /*
|
|
112
|
+
1, /* 77: French NIR (15 digits) */
|
|
113
|
+
1, /* 78: South African ID (13 digits) */
|
|
114
|
+
1, /* 79: Romanian CNP (13 digits) */
|
|
115
|
+
1, /* 80: Japanese My Number (12 digits) */
|
|
116
|
+
1, /* 81: Polish PESEL (11 digits) */
|
|
117
|
+
1, /* 82: Belgian National Number (11 digits) */
|
|
118
|
+
1, /* 83: Norwegian Fødselsnummer (11 digits) */
|
|
119
|
+
1, /* 84: Passport 9 digits */
|
|
120
|
+
1, /* 85: Dutch BSN (8-9 digits) */
|
|
121
|
+
1, /* 86: Austrian Abgabenkontonummer (9 digits) */
|
|
122
|
+
1 /* 87: Polish PESEL duplicate */
|
|
120
123
|
};
|
|
121
124
|
|
|
122
125
|
/*
|
|
@@ -124,56 +127,57 @@ const int boundary_wrapped[NUM_PATTERNS] = {
|
|
|
124
127
|
* patterns run when the caller passes a mask (only/except).
|
|
125
128
|
*/
|
|
126
129
|
const int pattern_tags[NUM_PATTERNS] = {
|
|
127
|
-
/* 0-
|
|
130
|
+
/* 0-31: secrets, API keys, tokens, private keys, webhooks */
|
|
128
131
|
TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
|
|
129
132
|
TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
|
|
130
133
|
TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
|
|
131
134
|
TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
|
|
132
135
|
TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
|
|
133
136
|
TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
|
|
134
|
-
|
|
137
|
+
TAG_CREDENTIALS, TAG_CREDENTIALS, TAG_CREDENTIALS,
|
|
138
|
+
/* 32-49: IBANs */
|
|
135
139
|
TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL,
|
|
136
140
|
TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL,
|
|
137
141
|
TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL,
|
|
138
142
|
TAG_FINANCIAL, TAG_FINANCIAL, TAG_FINANCIAL,
|
|
139
|
-
TAG_CONTACT, /*
|
|
140
|
-
TAG_CONTACT, /*
|
|
141
|
-
TAG_TAX_ID, /*
|
|
142
|
-
TAG_TAX_ID, /*
|
|
143
|
-
TAG_OTHER, /*
|
|
144
|
-
TAG_NETWORK, /*
|
|
145
|
-
TAG_FINANCIAL, /*
|
|
146
|
-
TAG_NATIONAL_ID, /*
|
|
147
|
-
TAG_NATIONAL_ID, /*
|
|
148
|
-
TAG_TAX_ID, /*
|
|
149
|
-
TAG_TAX_ID, /*
|
|
150
|
-
TAG_NATIONAL_ID, /*
|
|
151
|
-
TAG_NATIONAL_ID, /*
|
|
152
|
-
TAG_TRAVEL, /*
|
|
153
|
-
TAG_NATIONAL_ID, /*
|
|
154
|
-
TAG_NATIONAL_ID, /*
|
|
155
|
-
TAG_NATIONAL_ID, /*
|
|
156
|
-
TAG_NATIONAL_ID, /*
|
|
157
|
-
TAG_NATIONAL_ID, /*
|
|
158
|
-
TAG_NATIONAL_ID, /*
|
|
159
|
-
TAG_NATIONAL_ID, /*
|
|
160
|
-
TAG_TAX_ID, /*
|
|
161
|
-
TAG_NATIONAL_ID, /*
|
|
162
|
-
TAG_TAX_ID, /*
|
|
163
|
-
TAG_TAX_ID, /*
|
|
164
|
-
TAG_NATIONAL_ID, /*
|
|
165
|
-
TAG_TAX_ID, /*
|
|
166
|
-
TAG_NATIONAL_ID, /*
|
|
167
|
-
TAG_NATIONAL_ID, /*
|
|
168
|
-
TAG_NATIONAL_ID, /*
|
|
169
|
-
TAG_TAX_ID, /*
|
|
170
|
-
TAG_NATIONAL_ID, /*
|
|
171
|
-
TAG_NATIONAL_ID, /*
|
|
172
|
-
TAG_NATIONAL_ID, /*
|
|
173
|
-
TAG_TRAVEL, /*
|
|
174
|
-
TAG_NATIONAL_ID, /*
|
|
175
|
-
TAG_TAX_ID, /*
|
|
176
|
-
TAG_NATIONAL_ID /*
|
|
143
|
+
TAG_CONTACT, /* 50: email */
|
|
144
|
+
TAG_CONTACT, /* 51: phone */
|
|
145
|
+
TAG_TAX_ID, /* 52: Brazilian CNPJ */
|
|
146
|
+
TAG_TAX_ID, /* 53: Brazilian CPF */
|
|
147
|
+
TAG_OTHER, /* 54: UUID v4 */
|
|
148
|
+
TAG_NETWORK, /* 55: IPv4 */
|
|
149
|
+
TAG_FINANCIAL, /* 56: credit card */
|
|
150
|
+
TAG_NATIONAL_ID, /* 57: Indian Aadhaar */
|
|
151
|
+
TAG_NATIONAL_ID, /* 58: Mexican CURP */
|
|
152
|
+
TAG_TAX_ID, /* 59: Italian CF (omocodia) */
|
|
153
|
+
TAG_TAX_ID, /* 60: Italian CF (basic) */
|
|
154
|
+
TAG_NATIONAL_ID, /* 61: UK NIN */
|
|
155
|
+
TAG_NATIONAL_ID, /* 62: Spanish NIE */
|
|
156
|
+
TAG_TRAVEL, /* 63: passport letter prefix */
|
|
157
|
+
TAG_NATIONAL_ID, /* 64: Korean RRN */
|
|
158
|
+
TAG_NATIONAL_ID, /* 65: Swiss AHV */
|
|
159
|
+
TAG_NATIONAL_ID, /* 66: Finnish HETU */
|
|
160
|
+
TAG_NATIONAL_ID, /* 67: Swedish Personnummer */
|
|
161
|
+
TAG_NATIONAL_ID, /* 68: Danish CPR */
|
|
162
|
+
TAG_NATIONAL_ID, /* 69: Czech Rodné číslo */
|
|
163
|
+
TAG_NATIONAL_ID, /* 70: US SSN */
|
|
164
|
+
TAG_TAX_ID, /* 71: US ITIN */
|
|
165
|
+
TAG_NATIONAL_ID, /* 72: Canadian SIN */
|
|
166
|
+
TAG_TAX_ID, /* 73: Australian TFN */
|
|
167
|
+
TAG_TAX_ID, /* 74: Indian PAN */
|
|
168
|
+
TAG_NATIONAL_ID, /* 75: Spanish DNI */
|
|
169
|
+
TAG_TAX_ID, /* 76: Hungarian Tax ID */
|
|
170
|
+
TAG_NATIONAL_ID, /* 77: French NIR */
|
|
171
|
+
TAG_NATIONAL_ID, /* 78: South African ID */
|
|
172
|
+
TAG_NATIONAL_ID, /* 79: Romanian CNP */
|
|
173
|
+
TAG_TAX_ID, /* 80: Japanese My Number */
|
|
174
|
+
TAG_NATIONAL_ID, /* 81: Polish PESEL */
|
|
175
|
+
TAG_NATIONAL_ID, /* 82: Belgian National Number */
|
|
176
|
+
TAG_NATIONAL_ID, /* 83: Norwegian Fødselsnummer */
|
|
177
|
+
TAG_TRAVEL, /* 84: passport 9 digits */
|
|
178
|
+
TAG_NATIONAL_ID, /* 85: Dutch BSN */
|
|
179
|
+
TAG_TAX_ID, /* 86: Austrian Abgabenkontonummer */
|
|
180
|
+
TAG_NATIONAL_ID /* 87: Polish PESEL duplicate */
|
|
177
181
|
};
|
|
178
182
|
|
|
179
183
|
const char *pattern_names[NUM_PATTERNS] = {
|
|
@@ -206,62 +210,65 @@ const char *pattern_names[NUM_PATTERNS] = {
|
|
|
206
210
|
"scaleway_access_key", /* 26 */
|
|
207
211
|
"pem_private_key", /* 27 */
|
|
208
212
|
"gpg_private_key", /* 28 */
|
|
209
|
-
"
|
|
210
|
-
"
|
|
211
|
-
"
|
|
212
|
-
"
|
|
213
|
-
"
|
|
214
|
-
"
|
|
215
|
-
"
|
|
216
|
-
"
|
|
217
|
-
"
|
|
218
|
-
"
|
|
219
|
-
"
|
|
220
|
-
"
|
|
221
|
-
"
|
|
222
|
-
"
|
|
223
|
-
"
|
|
224
|
-
"
|
|
225
|
-
"
|
|
226
|
-
"
|
|
227
|
-
"
|
|
228
|
-
"
|
|
229
|
-
"
|
|
230
|
-
"
|
|
231
|
-
"
|
|
232
|
-
"
|
|
233
|
-
"
|
|
234
|
-
"
|
|
235
|
-
"
|
|
236
|
-
"
|
|
237
|
-
"
|
|
238
|
-
"
|
|
239
|
-
"
|
|
240
|
-
"
|
|
241
|
-
"
|
|
242
|
-
"
|
|
243
|
-
"
|
|
244
|
-
"
|
|
245
|
-
"
|
|
246
|
-
"
|
|
247
|
-
"
|
|
248
|
-
"
|
|
249
|
-
"
|
|
250
|
-
"
|
|
251
|
-
"
|
|
252
|
-
"
|
|
253
|
-
"
|
|
254
|
-
"
|
|
255
|
-
"
|
|
256
|
-
"
|
|
257
|
-
"
|
|
258
|
-
"
|
|
259
|
-
"
|
|
260
|
-
"
|
|
261
|
-
"
|
|
262
|
-
"
|
|
263
|
-
"
|
|
264
|
-
"
|
|
213
|
+
"hashicorp_vault_service_token", /* 29 */
|
|
214
|
+
"hashicorp_vault_batch_token", /* 30 */
|
|
215
|
+
"hashicorp_terraform_api_token", /* 31 */
|
|
216
|
+
"iban_hu", /* 32 */
|
|
217
|
+
"iban_pl", /* 33 */
|
|
218
|
+
"iban_fr", /* 34 */
|
|
219
|
+
"iban_it", /* 35 */
|
|
220
|
+
"iban_pt", /* 36 */
|
|
221
|
+
"iban_es", /* 37 */
|
|
222
|
+
"iban_cz", /* 38 */
|
|
223
|
+
"iban_ro", /* 39 */
|
|
224
|
+
"iban_se", /* 40 */
|
|
225
|
+
"iban_de", /* 41 */
|
|
226
|
+
"iban_ie", /* 42 */
|
|
227
|
+
"iban_ch", /* 43 */
|
|
228
|
+
"iban_at", /* 44 */
|
|
229
|
+
"iban_nl", /* 45 */
|
|
230
|
+
"iban_dk", /* 46 */
|
|
231
|
+
"iban_fi", /* 47 */
|
|
232
|
+
"iban_be", /* 48 */
|
|
233
|
+
"iban_no", /* 49 */
|
|
234
|
+
"email", /* 50 */
|
|
235
|
+
"phone_e164", /* 51 */
|
|
236
|
+
"brazilian_cnpj", /* 52 */
|
|
237
|
+
"brazilian_cpf", /* 53 */
|
|
238
|
+
"uuid_v4", /* 54 */
|
|
239
|
+
"ipv4", /* 55 */
|
|
240
|
+
"credit_card", /* 56 */
|
|
241
|
+
"indian_aadhaar", /* 57 */
|
|
242
|
+
"mexican_curp", /* 58 */
|
|
243
|
+
"italian_cf_omocodia", /* 59 */
|
|
244
|
+
"italian_cf", /* 60 */
|
|
245
|
+
"uk_nin", /* 61 */
|
|
246
|
+
"spanish_nie", /* 62 */
|
|
247
|
+
"passport_letter_prefix", /* 63 */
|
|
248
|
+
"korean_rrn", /* 64 */
|
|
249
|
+
"swiss_ahv", /* 65 */
|
|
250
|
+
"finnish_hetu", /* 66 */
|
|
251
|
+
"swedish_personnummer", /* 67 */
|
|
252
|
+
"danish_cpr", /* 68 */
|
|
253
|
+
"czech_rodne_cislo", /* 69 */
|
|
254
|
+
"us_ssn", /* 70 */
|
|
255
|
+
"us_itin", /* 71 */
|
|
256
|
+
"canadian_sin", /* 72 */
|
|
257
|
+
"australian_tfn", /* 73 */
|
|
258
|
+
"indian_pan", /* 74 */
|
|
259
|
+
"spanish_dni", /* 75 */
|
|
260
|
+
"hungarian_tax_id", /* 76 */
|
|
261
|
+
"french_nir", /* 77 */
|
|
262
|
+
"south_african_id", /* 78 */
|
|
263
|
+
"romanian_cnp", /* 79 */
|
|
264
|
+
"japanese_my_number", /* 80 */
|
|
265
|
+
"polish_pesel", /* 81 */
|
|
266
|
+
"belgian_national_number", /* 82 */
|
|
267
|
+
"norwegian_fodselsnummer", /* 83 */
|
|
268
|
+
"passport_9digits", /* 84 */
|
|
269
|
+
"dutch_bsn", /* 85 */
|
|
270
|
+
"austrian_abgabenkontonummer", /* 86 */
|
|
271
|
+
"polish_pesel_2" /* 87 */
|
|
265
272
|
};
|
|
266
273
|
|
|
267
274
|
/*
|
|
@@ -330,126 +337,132 @@ const char *pattern_strings[NUM_PATTERNS] = {
|
|
|
330
337
|
"-----BEGIN [A-Z ]*PRIVATE KEY-----",
|
|
331
338
|
/* 28: GPG Private Key Block */
|
|
332
339
|
"-----BEGIN PGP PRIVATE KEY BLOCK-----",
|
|
340
|
+
/* 29: HashiCorp Vault Service Token (hvs. + 90-120 base64url chars) */
|
|
341
|
+
"hvs\\.[A-Za-z0-9_-]{90,120}",
|
|
342
|
+
/* 30: HashiCorp Vault Batch Token (hvb. + 138-300 base64url chars) */
|
|
343
|
+
"hvb\\.[A-Za-z0-9_-]{138,300}",
|
|
344
|
+
/* 31: HashiCorp Terraform Cloud API Token (14 alphanum + .atlasv1. + 60-70 base64url chars) */
|
|
345
|
+
"[A-Za-z0-9]{14}\\.atlasv1\\.[A-Za-z0-9_=-]{60,70}",
|
|
333
346
|
|
|
334
347
|
/* ---- Tier 3: IBANs (longest → shortest) ---- */
|
|
335
|
-
/*
|
|
348
|
+
/* 32: Hungary IBAN (HU, 28 chars) */
|
|
336
349
|
"HU[0-9]{2}[0-9]{24}",
|
|
337
|
-
/*
|
|
350
|
+
/* 33: Poland IBAN (PL, 28 chars) */
|
|
338
351
|
"PL[0-9]{2}[0-9]{24}",
|
|
339
|
-
/*
|
|
352
|
+
/* 34: France IBAN (FR, 27 chars) */
|
|
340
353
|
"FR[0-9]{2}[0-9]{10}[A-Z0-9]{11}[0-9]{2}",
|
|
341
|
-
/*
|
|
354
|
+
/* 35: Italy IBAN (IT, 27 chars) */
|
|
342
355
|
"IT[0-9]{2}[A-Z][0-9]{10}[A-Z0-9]{12}",
|
|
343
|
-
/*
|
|
356
|
+
/* 36: Portugal IBAN (PT, 25 chars) */
|
|
344
357
|
"PT[0-9]{2}[0-9]{21}",
|
|
345
|
-
/*
|
|
358
|
+
/* 37: Spain IBAN (ES, 24 chars) */
|
|
346
359
|
"ES[0-9]{2}[0-9]{20}",
|
|
347
|
-
/*
|
|
360
|
+
/* 38: Czechia IBAN (CZ, 24 chars) */
|
|
348
361
|
"CZ[0-9]{2}[0-9]{20}",
|
|
349
|
-
/*
|
|
362
|
+
/* 39: Romania IBAN (RO, 24 chars) */
|
|
350
363
|
"RO[0-9]{2}[A-Z]{4}[A-Z0-9]{16}",
|
|
351
|
-
/*
|
|
364
|
+
/* 40: Sweden IBAN (SE, 24 chars) */
|
|
352
365
|
"SE[0-9]{2}[0-9]{20}",
|
|
353
|
-
/*
|
|
366
|
+
/* 41: Germany IBAN (DE, 22 chars) */
|
|
354
367
|
"DE[0-9]{2}[0-9]{18}",
|
|
355
|
-
/*
|
|
368
|
+
/* 42: Ireland IBAN (IE, 22 chars) */
|
|
356
369
|
"IE[0-9]{2}[A-Z]{4}[0-9]{14}",
|
|
357
|
-
/*
|
|
370
|
+
/* 43: Switzerland IBAN (CH, 21 chars) */
|
|
358
371
|
"CH[0-9]{2}[0-9]{5}[A-Z0-9]{12}",
|
|
359
|
-
/*
|
|
372
|
+
/* 44: Austria IBAN (AT, 20 chars) */
|
|
360
373
|
"AT[0-9]{2}[0-9]{16}",
|
|
361
|
-
/*
|
|
374
|
+
/* 45: Netherlands IBAN (NL, 18 chars) */
|
|
362
375
|
"NL[0-9]{2}[A-Z]{4}[0-9]{10}",
|
|
363
|
-
/*
|
|
376
|
+
/* 46: Denmark IBAN (DK, 18 chars) */
|
|
364
377
|
"DK[0-9]{2}[0-9]{14}",
|
|
365
|
-
/*
|
|
378
|
+
/* 47: Finland IBAN (FI, 18 chars) */
|
|
366
379
|
"FI[0-9]{2}[0-9]{14}",
|
|
367
|
-
/*
|
|
380
|
+
/* 48: Belgium IBAN (BE, 16 chars) */
|
|
368
381
|
"BE[0-9]{2}[0-9]{12}",
|
|
369
|
-
/*
|
|
382
|
+
/* 49: Norway IBAN (NO, 15 chars) */
|
|
370
383
|
"NO[0-9]{2}[0-9]{11}",
|
|
371
384
|
|
|
372
385
|
/* ---- Tier 4: Structured formats (dots, dashes, slashes, @) ---- */
|
|
373
|
-
/*
|
|
386
|
+
/* 50: Email Address */
|
|
374
387
|
"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}",
|
|
375
|
-
/*
|
|
388
|
+
/* 51: International Phone Number (E.164) */
|
|
376
389
|
"\\+[0-9]{1,3}[- ]?[0-9][0-9 -]{6,13}[0-9]",
|
|
377
|
-
/*
|
|
390
|
+
/* 52: Brazilian CNPJ (XX.XXX.XXX/XXXX-XX) */
|
|
378
391
|
"[0-9]{2}\\.[0-9]{3}\\.[0-9]{3}/[0-9]{4}-[0-9]{2}",
|
|
379
|
-
/*
|
|
392
|
+
/* 53: Brazilian CPF (XXX.XXX.XXX-XX) */
|
|
380
393
|
"[0-9]{3}\\.[0-9]{3}\\.[0-9]{3}-[0-9]{2}",
|
|
381
|
-
/*
|
|
394
|
+
/* 54: UUID v4 / Scaleway Secret Key */
|
|
382
395
|
"[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}",
|
|
383
|
-
/*
|
|
396
|
+
/* 55: IPv4 address */
|
|
384
397
|
"(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)",
|
|
385
|
-
/*
|
|
398
|
+
/* 56: Credit card numbers (Visa, Mastercard, Amex, Discover, JCB) */
|
|
386
399
|
"(4[0-9]{15}|4[0-9]{12}|5[1-5][0-9]{14}|6011[0-9]{12}|65[0-9]{14}|3[47][0-9]{13}|3[068][0-9]{11}|35[0-9]{14})",
|
|
387
|
-
/*
|
|
400
|
+
/* 57: Indian Aadhaar (XXXX XXXX XXXX or XXXX-XXXX-XXXX) */
|
|
388
401
|
"[0-9]{4}[- ][0-9]{4}[- ][0-9]{4}",
|
|
389
402
|
|
|
390
403
|
/* ---- Tier 5: Letter-anchored patterns ---- */
|
|
391
|
-
/*
|
|
404
|
+
/* 58: Mexican CURP (18 alphanum, distinctive structure) */
|
|
392
405
|
"[A-Z]{4}[0-9]{6}[HM][A-Z]{5}[A-Z0-9][0-9]",
|
|
393
|
-
/*
|
|
406
|
+
/* 59: Italian CF with omocodia (16 chars) */
|
|
394
407
|
"[A-Z]{6}[0-9LMNPQRSTUV]{2}[ABCDEHLMPRST][0-9LMNPQRSTUV]{2}[A-Z][0-9LMNPQRSTUV]{3}[A-Z]",
|
|
395
|
-
/*
|
|
408
|
+
/* 60: Italian CF basic (16 chars) */
|
|
396
409
|
"[A-Z]{6}[0-9]{2}[A-Z][0-9]{2}[A-Z][0-9]{3}[A-Z]",
|
|
397
|
-
/*
|
|
410
|
+
/* 61: UK National Insurance Number (AA 99 99 99 A-D) */
|
|
398
411
|
"[A-Z]{2} ?[0-9]{2} ?[0-9]{2} ?[0-9]{2} ?[A-D]",
|
|
399
|
-
/*
|
|
412
|
+
/* 62: Spanish NIE (X/Y/Z + 7 digits + letter) */
|
|
400
413
|
"[XYZ][0-9]{7}[A-Z]",
|
|
401
|
-
/*
|
|
414
|
+
/* 63: Passport - letter prefix + digits (e.g. AB1234567) */
|
|
402
415
|
"[A-Z]{1,2}[0-9]{6,7}",
|
|
403
416
|
|
|
404
417
|
/* ---- Tier 6: Boundary-wrapped structured (dash/dot/slash separated) ---- */
|
|
405
|
-
/*
|
|
418
|
+
/* 64: South Korean RRN (YYMMDD-XXXXXXX, 14 chars with dash) */
|
|
406
419
|
"[0-9]{6}-[0-9]{7}",
|
|
407
|
-
/*
|
|
420
|
+
/* 65: Swiss AHV Number (756.XXXX.XXXX.XX) */
|
|
408
421
|
"756\\.[0-9]{4}\\.[0-9]{4}\\.[0-9]{2}",
|
|
409
|
-
/*
|
|
422
|
+
/* 66: Finnish HETU (DDMMYY[+-A]XXXC) */
|
|
410
423
|
"[0-9]{6}[-+A][0-9]{3}[0-9A-Y]",
|
|
411
|
-
/*
|
|
424
|
+
/* 67: Swedish Personnummer (YYMMDD[-+]XXXX) */
|
|
412
425
|
"[0-9]{6}[-+][0-9]{4}",
|
|
413
|
-
/*
|
|
426
|
+
/* 68: Danish CPR Number (DDMMYY-XXXX) */
|
|
414
427
|
"[0-9]{6}-[0-9]{4}",
|
|
415
|
-
/*
|
|
428
|
+
/* 69: Czech Rodné číslo (YYMMDD/XXXX or YYMMDDXXXX) */
|
|
416
429
|
"[0-9]{6}/?[0-9]{3,4}",
|
|
417
|
-
/*
|
|
430
|
+
/* 70: US Social Security Number (XXX-XX-XXXX) */
|
|
418
431
|
"[0-9]{3}-[0-9]{2}-[0-9]{4}",
|
|
419
|
-
/*
|
|
432
|
+
/* 71: US ITIN (9XX-XX-XXXX) */
|
|
420
433
|
"9[0-9]{2}-[0-9]{2}-[0-9]{4}",
|
|
421
|
-
/*
|
|
434
|
+
/* 72: Canadian SIN (XXX-XXX-XXX) */
|
|
422
435
|
"[0-9]{3}-[0-9]{3}-[0-9]{3}",
|
|
423
|
-
/*
|
|
436
|
+
/* 73: Australian TFN (XXX-XXX-XXX or XXX XXX XXX) */
|
|
424
437
|
"[0-9]{3}[- ][0-9]{3}[- ][0-9]{3}",
|
|
425
|
-
/*
|
|
438
|
+
/* 74: Indian PAN (5 letters + 4 digits + 1 letter) */
|
|
426
439
|
"[A-Z]{5}[0-9]{4}[A-Z]",
|
|
427
|
-
/*
|
|
440
|
+
/* 75: Spanish DNI (8 digits + 1 letter) */
|
|
428
441
|
"[0-9]{8}[A-Z]",
|
|
429
|
-
/*
|
|
442
|
+
/* 76: Hungarian Tax ID (starts with 8, 10 digits) */
|
|
430
443
|
"8[0-9]{9}",
|
|
431
444
|
|
|
432
445
|
/* ---- Tier 7: Boundary-wrapped pure digits (longest → shortest) ---- */
|
|
433
|
-
/*
|
|
446
|
+
/* 77: French NIR / Social Security (15 digits) */
|
|
434
447
|
"[12][0-9]{2}[01][0-9][0-9]{2}[0-9]{3}[0-9]{3}[0-9]{2}",
|
|
435
|
-
/*
|
|
448
|
+
/* 78: South African ID (13 digits) */
|
|
436
449
|
"[0-9]{13}",
|
|
437
|
-
/*
|
|
450
|
+
/* 79: Romanian CNP (13 digits, first digit 1-8) */
|
|
438
451
|
"[1-8][0-9]{12}",
|
|
439
|
-
/*
|
|
452
|
+
/* 80: Japanese My Number (12 digits) */
|
|
440
453
|
"[0-9]{12}",
|
|
441
|
-
/*
|
|
454
|
+
/* 81: Polish PESEL (11 digits) */
|
|
442
455
|
"[0-9]{11}",
|
|
443
|
-
/*
|
|
456
|
+
/* 82: Belgian National Number (11 digits) */
|
|
444
457
|
"[0-9]{11}",
|
|
445
|
-
/*
|
|
458
|
+
/* 83: Norwegian Fødselsnummer (11 digits) */
|
|
446
459
|
"[0-9]{11}",
|
|
447
|
-
/*
|
|
460
|
+
/* 84: Passport - 9 consecutive digits */
|
|
448
461
|
"[0-9]{9}",
|
|
449
|
-
/*
|
|
462
|
+
/* 85: Dutch BSN (8-9 digits) */
|
|
450
463
|
"[0-9]{8,9}",
|
|
451
|
-
/*
|
|
464
|
+
/* 86: Austrian Abgabenkontonummer (9 digits) */
|
|
452
465
|
"[0-9]{9}",
|
|
453
|
-
/*
|
|
466
|
+
/* 87: Polish PESEL duplicate */
|
|
454
467
|
"[0-9]{11}"
|
|
455
468
|
};
|
data/lib/data_redactor.rb
CHANGED
|
@@ -1,4 +1,5 @@
|
|
|
1
1
|
require "set"
|
|
2
|
+
require "json"
|
|
2
3
|
require_relative "data_redactor/version"
|
|
3
4
|
require_relative "data_redactor/data_redactor" # loads the compiled .so
|
|
4
5
|
|
|
@@ -161,6 +162,54 @@ module DataRedactor
|
|
|
161
162
|
result
|
|
162
163
|
end
|
|
163
164
|
|
|
165
|
+
# Recursively redact every String value in a nested Hash/Array structure.
|
|
166
|
+
#
|
|
167
|
+
# Walks the structure depth-first. Only String leaves are passed through
|
|
168
|
+
# {redact}; all other leaf types (Integer, Float, nil, Symbol, Boolean)
|
|
169
|
+
# are copied unchanged. Hash keys are never modified.
|
|
170
|
+
#
|
|
171
|
+
# Returns a deep copy — the original structure is never mutated.
|
|
172
|
+
#
|
|
173
|
+
# @param data [Hash, Array, String, Object] the structure to walk.
|
|
174
|
+
# Any type is accepted; non-String scalars are returned as-is.
|
|
175
|
+
# @param only [Symbol, String, Array, nil] forwarded to {redact}.
|
|
176
|
+
# @param except [Symbol, String, Array, nil] forwarded to {redact}.
|
|
177
|
+
# @param placeholder [String, :tagged, :hash] forwarded to {redact}.
|
|
178
|
+
# @return [Hash, Array, String, Object] a new structure of the same shape
|
|
179
|
+
# with all String leaves redacted.
|
|
180
|
+
# @raise [ArgumentError] if the structure contains a circular reference.
|
|
181
|
+
#
|
|
182
|
+
# @example Rails params
|
|
183
|
+
# safe = DataRedactor.redact_deep(params.to_h)
|
|
184
|
+
#
|
|
185
|
+
# @example Mixed filter
|
|
186
|
+
# DataRedactor.redact_deep(payload, only: :credentials, placeholder: :tagged)
|
|
187
|
+
def redact_deep(data, only: nil, except: nil, placeholder: PLACEHOLDER_DEFAULT)
|
|
188
|
+
_walk(data, only: only, except: except, placeholder: placeholder, seen: Set.new)
|
|
189
|
+
end
|
|
190
|
+
|
|
191
|
+
# Parse +json_string+, redact every String value in the resulting structure,
|
|
192
|
+
# and return valid JSON.
|
|
193
|
+
#
|
|
194
|
+
# Delegates traversal to {redact_deep}. All keyword arguments are forwarded
|
|
195
|
+
# to {redact}.
|
|
196
|
+
#
|
|
197
|
+
# @param json_string [String] valid JSON input.
|
|
198
|
+
# @param only [Symbol, String, Array, nil] forwarded to {redact}.
|
|
199
|
+
# @param except [Symbol, String, Array, nil] forwarded to {redact}.
|
|
200
|
+
# @param placeholder [String, :tagged, :hash] forwarded to {redact}.
|
|
201
|
+
# @return [String] a JSON string with all String values redacted.
|
|
202
|
+
# @raise [JSON::ParserError] if +json_string+ is not valid JSON.
|
|
203
|
+
#
|
|
204
|
+
# @example
|
|
205
|
+
# DataRedactor.redact_json('{"email":"alice@example.com","count":3}')
|
|
206
|
+
# # => '{"email":"[REDACTED]","count":3}'
|
|
207
|
+
def redact_json(json_string, only: nil, except: nil, placeholder: PLACEHOLDER_DEFAULT)
|
|
208
|
+
parsed = JSON.parse(json_string)
|
|
209
|
+
redacted = redact_deep(parsed, only: only, except: except, placeholder: placeholder)
|
|
210
|
+
JSON.generate(redacted)
|
|
211
|
+
end
|
|
212
|
+
|
|
164
213
|
# Register a custom redaction pattern.
|
|
165
214
|
#
|
|
166
215
|
# Patterns must be valid POSIX ERE. Ruby-only syntax (+\d+, +\s+, +\w+,
|
|
@@ -317,6 +366,31 @@ module DataRedactor
|
|
|
317
366
|
bits
|
|
318
367
|
end
|
|
319
368
|
|
|
369
|
+
# @api private
|
|
370
|
+
# Depth-first recursive walker for {redact_deep}.
|
|
371
|
+
# +seen+ is a Set of object_ids already on the current traversal stack,
|
|
372
|
+
# used to detect circular references.
|
|
373
|
+
def _walk(node, only:, except:, placeholder:, seen:)
|
|
374
|
+
case node
|
|
375
|
+
when String
|
|
376
|
+
redact(node, only: only, except: except, placeholder: placeholder)
|
|
377
|
+
when Hash
|
|
378
|
+
raise ArgumentError, "redact_deep: circular reference detected" if seen.include?(node.object_id)
|
|
379
|
+
seen.add(node.object_id)
|
|
380
|
+
result = node.transform_values { |v| _walk(v, only: only, except: except, placeholder: placeholder, seen: seen) }
|
|
381
|
+
seen.delete(node.object_id)
|
|
382
|
+
result
|
|
383
|
+
when Array
|
|
384
|
+
raise ArgumentError, "redact_deep: circular reference detected" if seen.include?(node.object_id)
|
|
385
|
+
seen.add(node.object_id)
|
|
386
|
+
result = node.map { |v| _walk(v, only: only, except: except, placeholder: placeholder, seen: seen) }
|
|
387
|
+
seen.delete(node.object_id)
|
|
388
|
+
result
|
|
389
|
+
else
|
|
390
|
+
node
|
|
391
|
+
end
|
|
392
|
+
end
|
|
393
|
+
|
|
320
394
|
# @api private
|
|
321
395
|
def pattern_enabled?(name, tag_bit, only_present, only_bits, only_names,
|
|
322
396
|
except_bits, except_names)
|
data/readme.md
CHANGED
|
@@ -103,6 +103,36 @@ DataRedactor.scan(text, except: :network)
|
|
|
103
103
|
DataRedactor.scan(text, only: :contact, except: ["email"])
|
|
104
104
|
```
|
|
105
105
|
|
|
106
|
+
### Hash / JSON traversal
|
|
107
|
+
|
|
108
|
+
Redact every string value inside a nested Hash or Array — useful for params hashes, Sidekiq job payloads, webhook bodies, and anything that isn't a flat string:
|
|
109
|
+
|
|
110
|
+
```ruby
|
|
111
|
+
# Hash — returns a deep copy, never mutates the input
|
|
112
|
+
result = DataRedactor.redact_deep({
|
|
113
|
+
"user" => { "email" => "alice@example.com" },
|
|
114
|
+
"count" => 3,
|
|
115
|
+
"tags" => ["admin", "alice@example.com"]
|
|
116
|
+
})
|
|
117
|
+
# => { "user" => { "email" => "[REDACTED]" }, "count" => 3, "tags" => ["admin", "[REDACTED]"] }
|
|
118
|
+
|
|
119
|
+
# Hash keys are never touched — only values are redacted
|
|
120
|
+
# Non-string scalars (Integer, Float, nil, Boolean) pass through unchanged
|
|
121
|
+
|
|
122
|
+
# Accepts the same filters as redact
|
|
123
|
+
DataRedactor.redact_deep(params, only: :credentials)
|
|
124
|
+
DataRedactor.redact_deep(payload, except: :network, placeholder: :tagged)
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
```ruby
|
|
128
|
+
# JSON string — parse → redact_deep → re-serialise
|
|
129
|
+
safe_json = DataRedactor.redact_json('{"email":"alice@example.com","count":3}')
|
|
130
|
+
# => '{"email":"[REDACTED]","count":3}'
|
|
131
|
+
|
|
132
|
+
# Raises JSON::ParserError on invalid input
|
|
133
|
+
DataRedactor.redact_json("not json") # => JSON::ParserError
|
|
134
|
+
```
|
|
135
|
+
|
|
106
136
|
### Custom patterns
|
|
107
137
|
|
|
108
138
|
Teams often have internal IDs that the gem can't ship. Register them at boot:
|
|
@@ -179,7 +209,7 @@ Pass an empty subset (e.g. `scrub: [:headers]`) to opt out of body wrapping. For
|
|
|
179
209
|
|
|
180
210
|
> **Body wrapping is buffering.** The middleware reads the entire response body into memory before scanning. For streaming endpoints (SSE, large file downloads, Rack::Hijack) use `scrub: [:headers]` and rely on the Logger formatter for application logs instead.
|
|
181
211
|
|
|
182
|
-
## Detected patterns (
|
|
212
|
+
## Detected patterns (88 total)
|
|
183
213
|
|
|
184
214
|
The table below is a representative sample. Use `DataRedactor.pattern_names` for the canonical, machine-readable list — it stays in sync with the C extension automatically.
|
|
185
215
|
|
|
@@ -294,16 +324,46 @@ redactor/
|
|
|
294
324
|
## Requirements
|
|
295
325
|
|
|
296
326
|
- Ruby >= 2.7
|
|
297
|
-
- A C compiler (`gcc` or `clang`)
|
|
298
|
-
- POSIX `regex.h` (standard on Linux and macOS)
|
|
327
|
+
- A C compiler (`gcc` or `clang`) — only required when installing the source gem
|
|
328
|
+
- POSIX `regex.h` — only required when installing the source gem (standard on Linux and macOS)
|
|
329
|
+
|
|
330
|
+
## Installation
|
|
299
331
|
|
|
300
|
-
|
|
332
|
+
```ruby
|
|
333
|
+
# Gemfile
|
|
334
|
+
gem "data_redactor"
|
|
335
|
+
```
|
|
301
336
|
|
|
302
337
|
```bash
|
|
303
338
|
bundle install
|
|
304
339
|
```
|
|
305
340
|
|
|
306
|
-
|
|
341
|
+
That's it — there is nothing extra to configure for precompiled binaries. Bundler/RubyGems looks at your platform and Ruby version and picks the right gem automatically.
|
|
342
|
+
|
|
343
|
+
### What you'll see
|
|
344
|
+
|
|
345
|
+
- **On a supported platform** (Linux glibc/musl, macOS Intel/ARM): bundler downloads a precompiled gem with the C extension already built. Install is near-instant — **no compiler, no `make`, no `regex.h` headers needed**. Especially valuable in slim Docker images (`ruby:3.x-alpine`, `ruby:3.x-slim`) that don't ship `gcc`.
|
|
346
|
+
- **On any other platform** (FreeBSD, OpenBSD, etc.): bundler downloads the source gem and compiles the C extension on install — the same behavior as before 0.7.1. You'll need a C compiler and POSIX `regex.h` available.
|
|
347
|
+
|
|
348
|
+
### Supported precompiled targets
|
|
349
|
+
|
|
350
|
+
Each precompiled gem ships compiled binaries for Ruby 3.1, 3.2, 3.3, and 3.4.
|
|
351
|
+
|
|
352
|
+
| Platform | Targets |
|
|
353
|
+
|---|---|
|
|
354
|
+
| Linux (glibc) | `x86_64-linux`, `aarch64-linux` |
|
|
355
|
+
| Linux (musl / Alpine) | `x86_64-linux-musl`, `aarch64-linux-musl` |
|
|
356
|
+
| macOS | `x86_64-darwin` (Intel), `arm64-darwin` (Apple Silicon) |
|
|
357
|
+
|
|
358
|
+
### Bundler-locked deploys
|
|
359
|
+
|
|
360
|
+
If your `Gemfile.lock` was generated on one platform but you deploy to another, run `bundle lock --add-platform <target>` so bundler resolves the right native gem at deploy time. Example for Alpine deploys built from a glibc dev box:
|
|
361
|
+
|
|
362
|
+
```bash
|
|
363
|
+
bundle lock --add-platform x86_64-linux-musl aarch64-linux-musl
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
## Compile the C extension (source / development install only)
|
|
307
367
|
|
|
308
368
|
```bash
|
|
309
369
|
bundle exec rake compile
|
|
@@ -311,6 +371,16 @@ bundle exec rake compile
|
|
|
311
371
|
|
|
312
372
|
This runs `extconf.rb` via `rake-compiler`, which generates a `Makefile` and compiles `data_redactor.c` into a `.so` shared library placed under `lib/data_redactor/`.
|
|
313
373
|
|
|
374
|
+
## Building precompiled gems locally
|
|
375
|
+
|
|
376
|
+
Maintainers can rebuild the full set of native gems with one command (requires Docker):
|
|
377
|
+
|
|
378
|
+
```bash
|
|
379
|
+
bundle exec rake gem:all
|
|
380
|
+
```
|
|
381
|
+
|
|
382
|
+
This invokes `rake-compiler-dock` to cross-compile every supported (platform × Ruby ABI) combination. Output lands in `pkg/`.
|
|
383
|
+
|
|
314
384
|
## Run the tests
|
|
315
385
|
|
|
316
386
|
```bash
|
metadata
CHANGED
|
@@ -1,14 +1,13 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: data_redactor
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.8.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Daniele Frisanco
|
|
8
|
-
autorequire:
|
|
9
8
|
bindir: bin
|
|
10
9
|
cert_chain: []
|
|
11
|
-
date:
|
|
10
|
+
date: 1980-01-02 00:00:00.000000000 Z
|
|
12
11
|
dependencies:
|
|
13
12
|
- !ruby/object:Gem::Dependency
|
|
14
13
|
name: rake-compiler
|
|
@@ -24,6 +23,20 @@ dependencies:
|
|
|
24
23
|
- - "~>"
|
|
25
24
|
- !ruby/object:Gem::Version
|
|
26
25
|
version: '1.2'
|
|
26
|
+
- !ruby/object:Gem::Dependency
|
|
27
|
+
name: rake-compiler-dock
|
|
28
|
+
requirement: !ruby/object:Gem::Requirement
|
|
29
|
+
requirements:
|
|
30
|
+
- - "~>"
|
|
31
|
+
- !ruby/object:Gem::Version
|
|
32
|
+
version: '1.5'
|
|
33
|
+
type: :development
|
|
34
|
+
prerelease: false
|
|
35
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
36
|
+
requirements:
|
|
37
|
+
- - "~>"
|
|
38
|
+
- !ruby/object:Gem::Version
|
|
39
|
+
version: '1.5'
|
|
27
40
|
- !ruby/object:Gem::Dependency
|
|
28
41
|
name: rspec
|
|
29
42
|
requirement: !ruby/object:Gem::Requirement
|
|
@@ -108,7 +121,6 @@ metadata:
|
|
|
108
121
|
changelog_uri: https://github.com/danielefrisanco/data_redactor/blob/main/CHANGELOG.md
|
|
109
122
|
bug_tracker_uri: https://github.com/danielefrisanco/data_redactor/issues
|
|
110
123
|
rubygems_mfa_required: 'true'
|
|
111
|
-
post_install_message:
|
|
112
124
|
rdoc_options: []
|
|
113
125
|
require_paths:
|
|
114
126
|
- lib
|
|
@@ -123,8 +135,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
123
135
|
- !ruby/object:Gem::Version
|
|
124
136
|
version: '0'
|
|
125
137
|
requirements: []
|
|
126
|
-
rubygems_version: 3.
|
|
127
|
-
signing_key:
|
|
138
|
+
rubygems_version: 3.6.9
|
|
128
139
|
specification_version: 4
|
|
129
140
|
summary: Redact PII and secrets from strings before sending to AI or external services
|
|
130
141
|
test_files: []
|