column_anonymizer 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,515 +0,0 @@
1
- # Custom Anonymization Generators
2
-
3
- ## Overview
4
-
5
- Column Anonymizer allows you to define **custom anonymization types** from within your Rails application. This is perfect for:
6
-
7
- - Domain-specific data formats (employee IDs, account numbers, etc.)
8
- - Company-specific data patterns
9
- - Specialized anonymization requirements
10
- - Custom business logic
11
-
12
- ## Quick Start
13
-
14
- ### 1. Generate the Initializer
15
-
16
- ```bash
17
- rails generate column_anonymizer:initializer
18
- ```
19
-
20
- This creates `config/initializers/column_anonymizer.rb` with example generators.
21
-
22
- ### 2. Register Your Custom Generator
23
-
24
- ```ruby
25
- # config/initializers/column_anonymizer.rb
26
-
27
- ColumnAnonymizer::Anonymizer.register(:credit_card) do
28
- "XXXX-XXXX-XXXX-#{rand(1000..9999)}"
29
- end
30
- ```
31
-
32
- ### 3. Use in Your YAML Config
33
-
34
- ```yaml
35
- # config/encrypted_columns.yml
36
- User:
37
- credit_card_number: credit_card # ← Uses your custom generator
38
- ```
39
-
40
- ### 4. Anonymize!
41
-
42
- ```ruby
43
- user = User.first
44
- ColumnAnonymizer::Anonymizer.anonymize_model!(user)
45
-
46
- # credit_card_number becomes: "XXXX-XXXX-XXXX-1234"
47
- ```
48
-
49
- ---
50
-
51
- ## Registration Methods
52
-
53
- ### Block Syntax (Recommended)
54
-
55
- ```ruby
56
- ColumnAnonymizer::Anonymizer.register(:custom_type) do
57
- # Return the anonymized value
58
- "ANONYMIZED-#{SecureRandom.hex(4)}"
59
- end
60
- ```
61
-
62
- ### Lambda Syntax
63
-
64
- ```ruby
65
- ColumnAnonymizer::Anonymizer.register(:uuid, -> { SecureRandom.uuid })
66
- ```
67
-
68
- ### Callable Object
69
-
70
- ```ruby
71
- class EmployeeIdGenerator
72
- def self.call
73
- "EMP-#{Time.now.year}-#{rand(10000..99999)}"
74
- end
75
- end
76
-
77
- ColumnAnonymizer::Anonymizer.register(:employee_id, EmployeeIdGenerator)
78
- ```
79
-
80
- ---
81
-
82
- ## Example Custom Generators
83
-
84
- ### Simple Masked Format
85
-
86
- ```ruby
87
- ColumnAnonymizer::Anonymizer.register(:credit_card) do
88
- "XXXX-XXXX-XXXX-#{rand(1000..9999)}"
89
- end
90
- ```
91
-
92
- ### Using Faker
93
-
94
- ```ruby
95
- ColumnAnonymizer::Anonymizer.register(:company_name) do
96
- Faker::Company.name
97
- end
98
-
99
- ColumnAnonymizer::Anonymizer.register(:job_title) do
100
- Faker::Job.title
101
- end
102
- ```
103
-
104
- ### Custom Business Logic
105
-
106
- ```ruby
107
- ColumnAnonymizer::Anonymizer.register(:account_number) do
108
- prefix = "ACC"
109
- year = Time.now.year
110
- sequence = rand(10000..99999)
111
- "#{prefix}#{year}#{sequence}"
112
- end
113
- ```
114
-
115
- ### Format-Specific Patterns
116
-
117
- ```ruby
118
- ColumnAnonymizer::Anonymizer.register(:license_plate) do
119
- letters = ('A'..'Z').to_a.sample(3).join
120
- numbers = rand(100..999)
121
- "#{letters}-#{numbers}"
122
- end
123
-
124
- ColumnAnonymizer::Anonymizer.register(:vin) do
125
- chars = ('A'..'Z').to_a + (0..9).to_a
126
- 17.times.map { chars.sample }.join
127
- end
128
- ```
129
-
130
- ### Using Rails Models/Data
131
-
132
- ```ruby
133
- ColumnAnonymizer::Anonymizer.register(:department_name) do
134
- Department.active.pluck(:name).sample || "General"
135
- end
136
-
137
- ColumnAnonymizer::Anonymizer.register(:valid_country_code) do
138
- Country.all.pluck(:code).sample || "US"
139
- end
140
- ```
141
-
142
- ### Date/Time Generators
143
-
144
- ```ruby
145
- ColumnAnonymizer::Anonymizer.register(:birth_year) do
146
- rand(1950..2005).to_s
147
- end
148
-
149
- ColumnAnonymizer::Anonymizer.register(:recent_timestamp) do
150
- (Time.now - rand(1..365).days).iso8601
151
- end
152
- ```
153
-
154
- ### Complex Multi-Field Logic
155
-
156
- ```ruby
157
- ColumnAnonymizer::Anonymizer.register(:medical_record_number) do
158
- hospital_code = "HSP#{rand(100..999)}"
159
- year = Time.now.year.to_s[2..3]
160
- patient_seq = rand(100000..999999)
161
- "#{hospital_code}-#{year}-#{patient_seq}"
162
- end
163
- ```
164
-
165
- ### Deterministic (Repeatable) Anonymization
166
-
167
- ```ruby
168
- ColumnAnonymizer::Anonymizer.register(:deterministic_id) do
169
- # Same input always produces same output
170
- Digest::SHA256.hexdigest("#{Time.now.to_date}-#{rand}")[0..15]
171
- end
172
- ```
173
-
174
- ---
175
-
176
- ## API Reference
177
-
178
- ### `register(type, generator = nil, &block)`
179
-
180
- Register a custom anonymization generator.
181
-
182
- **Parameters:**
183
- - `type` (Symbol/String) - The identifier to use in YAML config
184
- - `generator` (Proc/#call) - Optional callable object
185
- - `&block` - Block that returns anonymized value
186
-
187
- **Examples:**
188
-
189
- ```ruby
190
- # With block
191
- ColumnAnonymizer::Anonymizer.register(:custom) do
192
- "CUSTOM-#{SecureRandom.hex(4)}"
193
- end
194
-
195
- # With lambda
196
- ColumnAnonymizer::Anonymizer.register(:uuid, -> { SecureRandom.uuid })
197
-
198
- # With callable class
199
- ColumnAnonymizer::Anonymizer.register(:employee_id, EmployeeIdGenerator)
200
- ```
201
-
202
- ### `unregister(type)`
203
-
204
- Remove a custom generator.
205
-
206
- ```ruby
207
- ColumnAnonymizer::Anonymizer.unregister(:credit_card)
208
- ```
209
-
210
- ### `all_generators`
211
-
212
- Get all available generators (built-in + custom).
213
-
214
- ```ruby
215
- generators = ColumnAnonymizer::Anonymizer.all_generators
216
- # => { email: #<Proc>, phone: #<Proc>, credit_card: #<Proc>, ... }
217
- ```
218
-
219
- ### `generator_exists?(type)`
220
-
221
- Check if a generator is registered.
222
-
223
- ```ruby
224
- ColumnAnonymizer::Anonymizer.generator_exists?(:credit_card)
225
- # => true
226
- ```
227
-
228
- ### `reset_custom_generators!`
229
-
230
- Clear all custom generators (useful for testing).
231
-
232
- ```ruby
233
- ColumnAnonymizer::Anonymizer.reset_custom_generators!
234
- ```
235
-
236
- ---
237
-
238
- ## Built-in Generators
239
-
240
- No need to register these - they're available by default:
241
-
242
- | Type | Example Output | Description |
243
- |------|----------------|-------------|
244
- | `:email` | `user_a1b2@example.com` | Fake email addresses |
245
- | `:phone` | `+15551234567` | Fake phone numbers |
246
- | `:ssn` | `123-45-6789` | Fake Social Security Numbers |
247
- | `:name` | `John Doe` | Fake full names |
248
- | `:first_name` | `John` | Fake first names |
249
- | `:last_name` | `Smith` | Fake last names |
250
- | `:address` | `1234 Main St` | Fake addresses |
251
- | `:text` | `Lorem ipsum...` | Lorem ipsum text |
252
-
253
- ---
254
-
255
- ## Complete Workflow Example
256
-
257
- ### Step 1: Create Initializer
258
-
259
- ```bash
260
- rails generate column_anonymizer:initializer
261
- ```
262
-
263
- ### Step 2: Register Custom Generators
264
-
265
- ```ruby
266
- # config/initializers/column_anonymizer.rb
267
-
268
- # Credit card masking
269
- ColumnAnonymizer::Anonymizer.register(:credit_card) do
270
- "XXXX-XXXX-XXXX-#{rand(1000..9999)}"
271
- end
272
-
273
- # Employee ID
274
- ColumnAnonymizer::Anonymizer.register(:employee_id) do
275
- "EMP-#{Time.now.year}-#{rand(10000..99999)}"
276
- end
277
-
278
- # Medical record number
279
- ColumnAnonymizer::Anonymizer.register(:mrn) do
280
- "MRN#{rand(100000..999999)}"
281
- end
282
- ```
283
-
284
- ### Step 3: Update YAML Config
285
-
286
- ```yaml
287
- # config/encrypted_columns.yml
288
- User:
289
- email: email # Built-in
290
- phone: phone # Built-in
291
- credit_card_number: credit_card # Custom!
292
-
293
- Employee:
294
- employee_number: employee_id # Custom!
295
- ssn: ssn # Built-in
296
-
297
- Patient:
298
- medical_record_number: mrn # Custom!
299
- ```
300
-
301
- ### Step 4: Use Models with Encryption
302
-
303
- ```ruby
304
- class User < ApplicationRecord
305
- encrypts :email
306
- encrypts :phone
307
- encrypts :credit_card_number
308
- end
309
-
310
- class Employee < ApplicationRecord
311
- encrypts :employee_number
312
- encrypts :ssn
313
- end
314
-
315
- class Patient < ApplicationRecord
316
- encrypts :medical_record_number
317
- end
318
- ```
319
-
320
- ### Step 5: Anonymize Data
321
-
322
- ```ruby
323
- # Single record
324
- user = User.first
325
- ColumnAnonymizer::Anonymizer.anonymize_model!(user)
326
-
327
- # Batch anonymization
328
- User.find_each do |user|
329
- ColumnAnonymizer::Anonymizer.anonymize_model!(user)
330
- end
331
-
332
- # Before:
333
- # user.credit_card_number => "4532-1234-5678-9012"
334
- # user.email => "john@example.com"
335
-
336
- # After:
337
- # user.credit_card_number => "XXXX-XXXX-XXXX-7823"
338
- # user.email => "user_abc123@example.com"
339
- ```
340
-
341
- ---
342
-
343
- ## Advanced Patterns
344
-
345
- ### Context-Aware Generators
346
-
347
- ```ruby
348
- # Generator that uses model instance context
349
- # Note: Generators receive no arguments, so you'd need to
350
- # handle context via other means (e.g., thread-local variables)
351
-
352
- ColumnAnonymizer::Anonymizer.register(:age_appropriate_name) do
353
- # Generate age-appropriate names
354
- birth_year = Thread.current[:anonymizing_birth_year]
355
- if birth_year && birth_year > 2000
356
- Faker::Name.first_name # Modern name
357
- else
358
- Faker::Name.middle_name # Classic name
359
- end
360
- end
361
- ```
362
-
363
- ### Locale-Specific Generators
364
-
365
- ```ruby
366
- ColumnAnonymizer::Anonymizer.register(:localized_address) do
367
- Faker::Config.locale = :en
368
- Faker::Address.full_address
369
- end
370
-
371
- ColumnAnonymizer::Anonymizer.register(:japanese_name) do
372
- Faker::Config.locale = :ja
373
- Faker::Name.name
374
- end
375
- ```
376
-
377
- ### Stateful Generators (Use with Caution)
378
-
379
- ```ruby
380
- # Counter-based generator
381
- counter = 0
382
- ColumnAnonymizer::Anonymizer.register(:sequential_id) do
383
- counter += 1
384
- "USER-#{counter.to_s.rjust(6, '0')}"
385
- end
386
- ```
387
-
388
- ---
389
-
390
- ## Testing Your Custom Generators
391
-
392
- ```ruby
393
- # spec/initializers/column_anonymizer_spec.rb
394
-
395
- RSpec.describe "Custom Anonymizers" do
396
- before do
397
- # Reset to clean state
398
- ColumnAnonymizer::Anonymizer.reset_custom_generators!
399
- end
400
-
401
- describe "credit_card generator" do
402
- before do
403
- ColumnAnonymizer::Anonymizer.register(:credit_card) do
404
- "XXXX-XXXX-XXXX-#{rand(1000..9999)}"
405
- end
406
- end
407
-
408
- it "generates masked credit card format" do
409
- generator = ColumnAnonymizer::Anonymizer.all_generators[:credit_card]
410
- result = generator.call
411
-
412
- expect(result).to match(/^XXXX-XXXX-XXXX-\d{4}$/)
413
- end
414
-
415
- it "is registered and exists" do
416
- expect(ColumnAnonymizer::Anonymizer.generator_exists?(:credit_card)).to be true
417
- end
418
- end
419
-
420
- describe "employee_id generator" do
421
- before do
422
- ColumnAnonymizer::Anonymizer.register(:employee_id) do
423
- "EMP-#{Time.now.year}-#{rand(10000..99999)}"
424
- end
425
- end
426
-
427
- it "generates employee ID format" do
428
- generator = ColumnAnonymizer::Anonymizer.all_generators[:employee_id]
429
- result = generator.call
430
-
431
- expect(result).to match(/^EMP-\d{4}-\d{5}$/)
432
- end
433
- end
434
- end
435
- ```
436
-
437
- ---
438
-
439
- ## Best Practices
440
-
441
- ### ✅ Do's
442
-
443
- - **Keep generators simple** - Single responsibility
444
- - **Make them fast** - Called potentially thousands of times
445
- - **Test your generators** - Ensure they produce valid data
446
- - **Use meaningful type names** - `credit_card` not `cc`
447
- - **Document your generators** - Add comments explaining the format
448
- - **Use built-ins when possible** - Don't reinvent the wheel
449
-
450
- ### ❌ Don'ts
451
-
452
- - **Don't make database calls in every generator** - Cache data if needed
453
- - **Don't use external APIs** - Too slow and unreliable
454
- - **Don't make generators stateful** - Unless you have a good reason
455
- - **Don't expose sensitive data** - Ensure anonymization is truly anonymous
456
- - **Don't make generators too complex** - Keep logic simple
457
-
458
- ---
459
-
460
- ## Troubleshooting
461
-
462
- ### Generator Not Found
463
-
464
- ```ruby
465
- # Error: No generator for type :custom_type
466
-
467
- # Solution: Make sure it's registered
468
- ColumnAnonymizer::Anonymizer.register(:custom_type) do
469
- "CUSTOM-#{SecureRandom.hex(4)}"
470
- end
471
- ```
472
-
473
- ### Generator Returns Nil
474
-
475
- ```ruby
476
- # Problem: Generator doesn't return a value
477
- ColumnAnonymizer::Anonymizer.register(:broken) do
478
- puts "Oops, no return value"
479
- # Missing return!
480
- end
481
-
482
- # Solution: Ensure generator returns a value
483
- ColumnAnonymizer::Anonymizer.register(:fixed) do
484
- "VALUE-#{SecureRandom.hex(4)}"
485
- end
486
- ```
487
-
488
- ### Initializer Not Loading
489
-
490
- ```bash
491
- # Make sure initializer is in the right place
492
- ls config/initializers/column_anonymizer.rb
493
-
494
- # Restart Rails server after changes
495
- rails restart
496
- ```
497
-
498
- ---
499
-
500
- ## Summary
501
-
502
- Custom anonymization generators give you:
503
-
504
- ✅ **Flexibility** - Define domain-specific anonymization
505
- ✅ **Simplicity** - Easy registration with blocks
506
- ✅ **Power** - Use Faker, Rails models, or custom logic
507
- ✅ **Testability** - Easy to test in isolation
508
- ✅ **Maintainability** - Centralized in initializer
509
-
510
- Start with:
511
- ```bash
512
- rails generate column_anonymizer:initializer
513
- ```
514
-
515
- Then register your custom types and use them in your YAML config! 🚀