column_anonymizer 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.rspec +1 -0
- data/.rspec_status +15 -0
- data/CHANGELOG.md +49 -0
- data/CUSTOM_GENERATORS_COMPLETE.md +507 -0
- data/CUSTOM_GENERATORS_GUIDE.md +515 -0
- data/CUSTOM_GENERATORS_IMPLEMENTATION.md +471 -0
- data/CUSTOM_GENERATORS_QUICK_REF.md +95 -0
- data/FEATURE_COMPLETE.md +287 -0
- data/GEMSPEC_FIX.md +90 -0
- data/IMPLEMENTATION_SUMMARY.md +205 -0
- data/QUICK_REFERENCE.md +92 -0
- data/RAKE_TASKS_GUIDE.md +469 -0
- data/RAKE_TASKS_IMPLEMENTATION.md +363 -0
- data/RAKE_TASKS_QUICK_REF.md +164 -0
- data/README.md +389 -0
- data/Rakefile +12 -0
- data/SCAN_GENERATOR_TEST.md +141 -0
- data/WORKFLOW_GUIDE.md +368 -0
- data/YAML_MIGRATION_GUIDE.md +284 -0
- data/lib/column_anonymizer/anonymizer.rb +103 -0
- data/lib/column_anonymizer/encryptable.rb +25 -0
- data/lib/column_anonymizer/railtie.rb +15 -0
- data/lib/column_anonymizer/schema_loader.rb +44 -0
- data/lib/column_anonymizer/version.rb +5 -0
- data/lib/column_anonymizer.rb +9 -0
- data/lib/generators/column_anonymizer/initializer/initializer_generator.rb +25 -0
- data/lib/generators/column_anonymizer/initializer/templates/column_anonymizer.rb +77 -0
- data/lib/generators/column_anonymizer/install/README +46 -0
- data/lib/generators/column_anonymizer/install/install_generator.rb +36 -0
- data/lib/generators/column_anonymizer/install/templates/encrypted_columns.yml +29 -0
- data/lib/generators/column_anonymizer/scan/scan_generator.rb +250 -0
- data/lib/tasks/column_anonymizer.rake +318 -0
- metadata +108 -0
data/RAKE_TASKS_GUIDE.md
ADDED
|
@@ -0,0 +1,469 @@
|
|
|
1
|
+
# Rake Tasks Guide
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
Column Anonymizer provides powerful Rake tasks to anonymize your encrypted data in bulk. These tasks read your `config/encrypted_columns.yml` configuration and process records accordingly.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Available Tasks
|
|
10
|
+
|
|
11
|
+
### 1. `rake column_anonymizer:anonymize_all`
|
|
12
|
+
|
|
13
|
+
Anonymize **all records** for **all models** defined in `config/encrypted_columns.yml`.
|
|
14
|
+
|
|
15
|
+
**Usage:**
|
|
16
|
+
```bash
|
|
17
|
+
rake column_anonymizer:anonymize_all
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
**Example Output:**
|
|
21
|
+
```
|
|
22
|
+
🔍 Found 3 model(s) in configuration
|
|
23
|
+
======================================================================
|
|
24
|
+
|
|
25
|
+
📋 Processing User...
|
|
26
|
+
Columns: email, phone, ssn
|
|
27
|
+
Records: 1,523
|
|
28
|
+
✅ Anonymized 1,523 record(s)
|
|
29
|
+
|
|
30
|
+
📋 Processing Employee...
|
|
31
|
+
Columns: employee_number, ssn
|
|
32
|
+
Records: 847
|
|
33
|
+
✅ Anonymized 847 record(s)
|
|
34
|
+
|
|
35
|
+
📋 Processing Patient...
|
|
36
|
+
Columns: medical_record_number
|
|
37
|
+
Records: 2,104
|
|
38
|
+
✅ Anonymized 2,104 record(s)
|
|
39
|
+
|
|
40
|
+
======================================================================
|
|
41
|
+
🎉 Anonymization complete!
|
|
42
|
+
Total records anonymized: 4,474
|
|
43
|
+
======================================================================
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
**Features:**
|
|
47
|
+
- ✅ Processes all models in configuration
|
|
48
|
+
- ✅ Shows progress every 100 records
|
|
49
|
+
- ✅ Handles errors gracefully
|
|
50
|
+
- ✅ Shows summary statistics
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
### 2. `rake column_anonymizer:anonymize_model[ModelName]`
|
|
55
|
+
|
|
56
|
+
Anonymize **all records** for a **specific model**.
|
|
57
|
+
|
|
58
|
+
**Usage:**
|
|
59
|
+
```bash
|
|
60
|
+
rake column_anonymizer:anonymize_model[User]
|
|
61
|
+
rake column_anonymizer:anonymize_model[Employee]
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
**Example Output:**
|
|
65
|
+
```
|
|
66
|
+
📋 Anonymizing User
|
|
67
|
+
Columns: email, phone, ssn
|
|
68
|
+
Records: 1,523
|
|
69
|
+
Progress: 1523/1523
|
|
70
|
+
✅ Anonymized 1,523 record(s)
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
**Features:**
|
|
74
|
+
- ✅ Focus on one model
|
|
75
|
+
- ✅ Faster than anonymize_all for single model
|
|
76
|
+
- ✅ Shows progress
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
### 3. `rake column_anonymizer:anonymize_where[ModelName,'condition']`
|
|
81
|
+
|
|
82
|
+
Anonymize records **matching a condition**.
|
|
83
|
+
|
|
84
|
+
**Usage:**
|
|
85
|
+
```bash
|
|
86
|
+
# Anonymize old users
|
|
87
|
+
rake column_anonymizer:anonymize_where[User,'created_at < "2023-01-01"']
|
|
88
|
+
|
|
89
|
+
# Anonymize inactive employees
|
|
90
|
+
rake column_anonymizer:anonymize_where[Employee,'status = "inactive"']
|
|
91
|
+
|
|
92
|
+
# Anonymize by ID range
|
|
93
|
+
rake column_anonymizer:anonymize_where[Patient,'id < 1000']
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
**Example Output:**
|
|
97
|
+
```
|
|
98
|
+
📋 Anonymizing User where created_at < "2023-01-01"
|
|
99
|
+
Columns: email, phone, ssn
|
|
100
|
+
Matching records: 342
|
|
101
|
+
⚠️ This will anonymize 342 record(s). Continue? (y/N): y
|
|
102
|
+
✅ Anonymized 342 record(s)
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
**Features:**
|
|
106
|
+
- ✅ Target specific records
|
|
107
|
+
- ✅ Requires confirmation before processing
|
|
108
|
+
- ✅ Supports any WHERE clause
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
### 4. `rake column_anonymizer:preview`
|
|
113
|
+
|
|
114
|
+
Preview what **would be anonymized** without making changes.
|
|
115
|
+
|
|
116
|
+
**Usage:**
|
|
117
|
+
```bash
|
|
118
|
+
rake column_anonymizer:preview
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
**Example Output:**
|
|
122
|
+
```
|
|
123
|
+
🔍 Anonymization Preview
|
|
124
|
+
======================================================================
|
|
125
|
+
|
|
126
|
+
User:
|
|
127
|
+
Columns to anonymize: email, phone, ssn
|
|
128
|
+
Records to process: 1,523
|
|
129
|
+
Types: email, phone, ssn
|
|
130
|
+
|
|
131
|
+
Example (first record):
|
|
132
|
+
email: john.doe@example.com
|
|
133
|
+
phone: +1-555-123-4567
|
|
134
|
+
ssn: 123-45-6789
|
|
135
|
+
|
|
136
|
+
Employee:
|
|
137
|
+
Columns to anonymize: employee_number, ssn
|
|
138
|
+
Records to process: 847
|
|
139
|
+
Types: employee_id, ssn
|
|
140
|
+
|
|
141
|
+
Example (first record):
|
|
142
|
+
employee_number: EMP-2023-12345
|
|
143
|
+
ssn: 987-65-4321
|
|
144
|
+
|
|
145
|
+
======================================================================
|
|
146
|
+
💡 Run 'rake column_anonymizer:anonymize_all' to perform anonymization
|
|
147
|
+
======================================================================
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
**Features:**
|
|
151
|
+
- ✅ Safe dry-run
|
|
152
|
+
- ✅ Shows what will be processed
|
|
153
|
+
- ✅ Shows example values
|
|
154
|
+
- ✅ No data changes
|
|
155
|
+
|
|
156
|
+
---
|
|
157
|
+
|
|
158
|
+
### 5. `rake column_anonymizer:stats`
|
|
159
|
+
|
|
160
|
+
Show **statistics** about your encrypted columns.
|
|
161
|
+
|
|
162
|
+
**Usage:**
|
|
163
|
+
```bash
|
|
164
|
+
rake column_anonymizer:stats
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
**Example Output:**
|
|
168
|
+
```
|
|
169
|
+
📊 Column Anonymizer Statistics
|
|
170
|
+
======================================================================
|
|
171
|
+
|
|
172
|
+
Models configured: 3
|
|
173
|
+
Total encrypted columns: 6
|
|
174
|
+
|
|
175
|
+
Detailed breakdown:
|
|
176
|
+
|
|
177
|
+
User:
|
|
178
|
+
Columns: 3 (email, phone, ssn)
|
|
179
|
+
Records: 1,523
|
|
180
|
+
Types: email, phone, ssn
|
|
181
|
+
|
|
182
|
+
Employee:
|
|
183
|
+
Columns: 2 (employee_number, ssn)
|
|
184
|
+
Records: 847
|
|
185
|
+
Types: employee_id, ssn
|
|
186
|
+
|
|
187
|
+
Patient:
|
|
188
|
+
Columns: 1 (medical_record_number)
|
|
189
|
+
Records: 2,104
|
|
190
|
+
Types: mrn
|
|
191
|
+
|
|
192
|
+
======================================================================
|
|
193
|
+
Total records across all models: 4,474
|
|
194
|
+
======================================================================
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
**Features:**
|
|
198
|
+
- ✅ Overview of all models
|
|
199
|
+
- ✅ Column counts
|
|
200
|
+
- ✅ Record counts
|
|
201
|
+
- ✅ Type information
|
|
202
|
+
|
|
203
|
+
---
|
|
204
|
+
|
|
205
|
+
## Common Workflows
|
|
206
|
+
|
|
207
|
+
### Initial Setup and Testing
|
|
208
|
+
|
|
209
|
+
```bash
|
|
210
|
+
# 1. Preview what will be anonymized
|
|
211
|
+
rake column_anonymizer:preview
|
|
212
|
+
|
|
213
|
+
# 2. Check statistics
|
|
214
|
+
rake column_anonymizer:stats
|
|
215
|
+
|
|
216
|
+
# 3. Test on one model first
|
|
217
|
+
rake column_anonymizer:anonymize_model[User]
|
|
218
|
+
|
|
219
|
+
# 4. If successful, anonymize everything
|
|
220
|
+
rake column_anonymizer:anonymize_all
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
### Anonymizing Old Data
|
|
224
|
+
|
|
225
|
+
```bash
|
|
226
|
+
# Anonymize users older than 2 years
|
|
227
|
+
rake column_anonymizer:anonymize_where[User,'created_at < "2024-01-01"']
|
|
228
|
+
|
|
229
|
+
# Anonymize deleted accounts
|
|
230
|
+
rake column_anonymizer:anonymize_where[User,'deleted_at IS NOT NULL']
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
### Production Deployment
|
|
234
|
+
|
|
235
|
+
```bash
|
|
236
|
+
# 1. Preview on staging
|
|
237
|
+
RAILS_ENV=staging rake column_anonymizer:preview
|
|
238
|
+
|
|
239
|
+
# 2. Check stats
|
|
240
|
+
RAILS_ENV=staging rake column_anonymizer:stats
|
|
241
|
+
|
|
242
|
+
# 3. Anonymize in production (with backup!)
|
|
243
|
+
RAILS_ENV=production rake column_anonymizer:anonymize_all
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
### Regular Maintenance
|
|
247
|
+
|
|
248
|
+
```bash
|
|
249
|
+
# Weekly: Anonymize accounts older than 1 year
|
|
250
|
+
rake column_anonymizer:anonymize_where[User,'last_login_at < "2025-02-01"']
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
## Advanced Usage
|
|
256
|
+
|
|
257
|
+
### With Custom Generators
|
|
258
|
+
|
|
259
|
+
If you've registered custom generators:
|
|
260
|
+
|
|
261
|
+
```ruby
|
|
262
|
+
# config/initializers/column_anonymizer.rb
|
|
263
|
+
ColumnAnonymizer::Anonymizer.register(:credit_card) do
|
|
264
|
+
"XXXX-XXXX-XXXX-#{rand(1000..9999)}"
|
|
265
|
+
end
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
The Rake tasks will automatically use your custom generators:
|
|
269
|
+
|
|
270
|
+
```bash
|
|
271
|
+
rake column_anonymizer:anonymize_all
|
|
272
|
+
# Uses your custom credit_card generator
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
### Batch Processing Large Datasets
|
|
276
|
+
|
|
277
|
+
For very large tables, consider processing in batches:
|
|
278
|
+
|
|
279
|
+
```bash
|
|
280
|
+
# Process by ID ranges
|
|
281
|
+
rake column_anonymizer:anonymize_where[User,'id BETWEEN 1 AND 10000']
|
|
282
|
+
rake column_anonymizer:anonymize_where[User,'id BETWEEN 10001 AND 20000']
|
|
283
|
+
# etc.
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
### Scheduling with Cron
|
|
287
|
+
|
|
288
|
+
```bash
|
|
289
|
+
# Add to crontab
|
|
290
|
+
# Anonymize old records weekly on Sunday at 2 AM
|
|
291
|
+
0 2 * * 0 cd /app && RAILS_ENV=production rake column_anonymizer:anonymize_where[User,'created_at < NOW() - INTERVAL 1 YEAR']
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
### Using in Deployment Scripts
|
|
295
|
+
|
|
296
|
+
```ruby
|
|
297
|
+
# config/deploy.rb (Capistrano)
|
|
298
|
+
after 'deploy:migrate', 'column_anonymizer:anonymize_old_data'
|
|
299
|
+
|
|
300
|
+
namespace :column_anonymizer do
|
|
301
|
+
task :anonymize_old_data do
|
|
302
|
+
on roles(:db) do
|
|
303
|
+
within release_path do
|
|
304
|
+
with rails_env: fetch(:rails_env) do
|
|
305
|
+
execute :rake, 'column_anonymizer:anonymize_where[User,\'created_at < "2023-01-01"\']'
|
|
306
|
+
end
|
|
307
|
+
end
|
|
308
|
+
end
|
|
309
|
+
end
|
|
310
|
+
end
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
---
|
|
314
|
+
|
|
315
|
+
## Error Handling
|
|
316
|
+
|
|
317
|
+
The tasks handle errors gracefully:
|
|
318
|
+
|
|
319
|
+
```
|
|
320
|
+
📋 Processing User...
|
|
321
|
+
Columns: email, phone, ssn
|
|
322
|
+
Records: 1,523
|
|
323
|
+
❌ Error anonymizing User ID 42: Validation failed
|
|
324
|
+
❌ Error anonymizing User ID 105: undefined method 'email='
|
|
325
|
+
✅ Anonymized 1,521 record(s)
|
|
326
|
+
⚠️ 2 error(s)
|
|
327
|
+
```
|
|
328
|
+
|
|
329
|
+
**Common errors:**
|
|
330
|
+
- Model class not found → Check model name spelling
|
|
331
|
+
- Validation errors → Check model validations
|
|
332
|
+
- Missing column → Check YAML config matches model
|
|
333
|
+
|
|
334
|
+
---
|
|
335
|
+
|
|
336
|
+
## Performance Tips
|
|
337
|
+
|
|
338
|
+
### For Large Tables
|
|
339
|
+
|
|
340
|
+
1. **Use batching** - `find_each` processes in batches of 1000
|
|
341
|
+
2. **Process during low traffic** - Run during off-peak hours
|
|
342
|
+
3. **Use conditions** - Process subsets with `anonymize_where`
|
|
343
|
+
4. **Monitor progress** - Watch the progress indicators
|
|
344
|
+
5. **Check database locks** - Ensure no long-running queries
|
|
345
|
+
|
|
346
|
+
### Optimization
|
|
347
|
+
|
|
348
|
+
```bash
|
|
349
|
+
# Process specific date ranges
|
|
350
|
+
rake column_anonymizer:anonymize_where[User,'created_at BETWEEN "2020-01-01" AND "2020-12-31"']
|
|
351
|
+
|
|
352
|
+
# Use indexes
|
|
353
|
+
# Add indexes on commonly queried columns (created_at, status, etc.)
|
|
354
|
+
```
|
|
355
|
+
|
|
356
|
+
---
|
|
357
|
+
|
|
358
|
+
## Safety Checklist
|
|
359
|
+
|
|
360
|
+
Before running in production:
|
|
361
|
+
|
|
362
|
+
- ✅ **Backup your database** - Always have a recent backup
|
|
363
|
+
- ✅ **Test on staging** - Run tasks on staging first
|
|
364
|
+
- ✅ **Preview first** - Use `preview` task to see what will change
|
|
365
|
+
- ✅ **Check stats** - Use `stats` task to understand scope
|
|
366
|
+
- ✅ **Test one model** - Use `anonymize_model` on one model first
|
|
367
|
+
- ✅ **Verify custom generators** - Ensure custom generators work correctly
|
|
368
|
+
- ✅ **Check validations** - Ensure anonymized data passes validations
|
|
369
|
+
- ✅ **Monitor performance** - Watch database and app performance
|
|
370
|
+
- ✅ **Have rollback plan** - Know how to restore from backup
|
|
371
|
+
|
|
372
|
+
---
|
|
373
|
+
|
|
374
|
+
## Integration Examples
|
|
375
|
+
|
|
376
|
+
### With Heroku Scheduler
|
|
377
|
+
|
|
378
|
+
```bash
|
|
379
|
+
# Add to Heroku Scheduler (daily at midnight)
|
|
380
|
+
rake column_anonymizer:anonymize_where[User,'deleted_at < NOW() - INTERVAL 30 DAY']
|
|
381
|
+
```
|
|
382
|
+
|
|
383
|
+
### With Sidekiq/Background Jobs
|
|
384
|
+
|
|
385
|
+
```ruby
|
|
386
|
+
# app/jobs/anonymize_old_users_job.rb
|
|
387
|
+
class AnonymizeOldUsersJob < ApplicationJob
|
|
388
|
+
queue_as :default
|
|
389
|
+
|
|
390
|
+
def perform
|
|
391
|
+
system("rake column_anonymizer:anonymize_where[User,'created_at < \"#{1.year.ago.to_date}\"']")
|
|
392
|
+
end
|
|
393
|
+
end
|
|
394
|
+
```
|
|
395
|
+
|
|
396
|
+
### With Rails Runner
|
|
397
|
+
|
|
398
|
+
```bash
|
|
399
|
+
# One-off command
|
|
400
|
+
rails runner "
|
|
401
|
+
schema = ColumnAnonymizer::SchemaLoader.load_schema
|
|
402
|
+
User.where('created_at < ?', 2.years.ago).find_each do |user|
|
|
403
|
+
ColumnAnonymizer::Anonymizer.anonymize_model!(user)
|
|
404
|
+
end
|
|
405
|
+
"
|
|
406
|
+
```
|
|
407
|
+
|
|
408
|
+
---
|
|
409
|
+
|
|
410
|
+
## Troubleshooting
|
|
411
|
+
|
|
412
|
+
### Task Not Found
|
|
413
|
+
|
|
414
|
+
```bash
|
|
415
|
+
# Error: Don't know how to build task 'column_anonymizer:anonymize_all'
|
|
416
|
+
|
|
417
|
+
# Solution: Ensure rake file is loaded
|
|
418
|
+
# For gems, add to gemspec:
|
|
419
|
+
spec.files = Dir['lib/**/*.{rb,rake}']
|
|
420
|
+
|
|
421
|
+
# Or manually load:
|
|
422
|
+
rake -T column_anonymizer
|
|
423
|
+
```
|
|
424
|
+
|
|
425
|
+
### Model Not Found
|
|
426
|
+
|
|
427
|
+
```bash
|
|
428
|
+
# Error: Model class 'User' not found
|
|
429
|
+
|
|
430
|
+
# Solution: Check model name in encrypted_columns.yml
|
|
431
|
+
# Must match exact class name (case-sensitive)
|
|
432
|
+
```
|
|
433
|
+
|
|
434
|
+
### No Records Anonymized
|
|
435
|
+
|
|
436
|
+
```bash
|
|
437
|
+
# Check:
|
|
438
|
+
1. Models have encrypted columns defined
|
|
439
|
+
2. YAML config matches model structure
|
|
440
|
+
3. Records actually exist
|
|
441
|
+
4. Models respond to encrypted_columns_metadata
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
---
|
|
445
|
+
|
|
446
|
+
## Quick Reference
|
|
447
|
+
|
|
448
|
+
| Task | Purpose | Example |
|
|
449
|
+
|------|---------|---------|
|
|
450
|
+
| `anonymize_all` | Anonymize all models | `rake column_anonymizer:anonymize_all` |
|
|
451
|
+
| `anonymize_model[Model]` | Anonymize one model | `rake column_anonymizer:anonymize_model[User]` |
|
|
452
|
+
| `anonymize_where[Model,'condition']` | Conditional anonymization | `rake column_anonymizer:anonymize_where[User,'id < 1000']` |
|
|
453
|
+
| `preview` | Dry run | `rake column_anonymizer:preview` |
|
|
454
|
+
| `stats` | Show statistics | `rake column_anonymizer:stats` |
|
|
455
|
+
|
|
456
|
+
---
|
|
457
|
+
|
|
458
|
+
## Summary
|
|
459
|
+
|
|
460
|
+
The Rake tasks provide:
|
|
461
|
+
|
|
462
|
+
✅ **Bulk anonymization** - Process all models at once
|
|
463
|
+
✅ **Selective anonymization** - Target specific models or records
|
|
464
|
+
✅ **Safe preview** - Check before making changes
|
|
465
|
+
✅ **Progress tracking** - Monitor long-running operations
|
|
466
|
+
✅ **Error handling** - Graceful error recovery
|
|
467
|
+
✅ **Statistics** - Understand your data
|
|
468
|
+
|
|
469
|
+
Start with `preview` and `stats`, then use `anonymize_all` or `anonymize_model` as needed! 🚀
|