column_anonymizer 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,469 @@
1
+ # Rake Tasks Guide
2
+
3
+ ## Overview
4
+
5
+ Column Anonymizer provides powerful Rake tasks to anonymize your encrypted data in bulk. These tasks read your `config/encrypted_columns.yml` configuration and process records accordingly.
6
+
7
+ ---
8
+
9
+ ## Available Tasks
10
+
11
+ ### 1. `rake column_anonymizer:anonymize_all`
12
+
13
+ Anonymize **all records** for **all models** defined in `config/encrypted_columns.yml`.
14
+
15
+ **Usage:**
16
+ ```bash
17
+ rake column_anonymizer:anonymize_all
18
+ ```
19
+
20
+ **Example Output:**
21
+ ```
22
+ 🔍 Found 3 model(s) in configuration
23
+ ======================================================================
24
+
25
+ 📋 Processing User...
26
+ Columns: email, phone, ssn
27
+ Records: 1,523
28
+ ✅ Anonymized 1,523 record(s)
29
+
30
+ 📋 Processing Employee...
31
+ Columns: employee_number, ssn
32
+ Records: 847
33
+ ✅ Anonymized 847 record(s)
34
+
35
+ 📋 Processing Patient...
36
+ Columns: medical_record_number
37
+ Records: 2,104
38
+ ✅ Anonymized 2,104 record(s)
39
+
40
+ ======================================================================
41
+ 🎉 Anonymization complete!
42
+ Total records anonymized: 4,474
43
+ ======================================================================
44
+ ```
45
+
46
+ **Features:**
47
+ - ✅ Processes all models in configuration
48
+ - ✅ Shows progress every 100 records
49
+ - ✅ Handles errors gracefully
50
+ - ✅ Shows summary statistics
51
+
52
+ ---
53
+
54
+ ### 2. `rake column_anonymizer:anonymize_model[ModelName]`
55
+
56
+ Anonymize **all records** for a **specific model**.
57
+
58
+ **Usage:**
59
+ ```bash
60
+ rake column_anonymizer:anonymize_model[User]
61
+ rake column_anonymizer:anonymize_model[Employee]
62
+ ```
63
+
64
+ **Example Output:**
65
+ ```
66
+ 📋 Anonymizing User
67
+ Columns: email, phone, ssn
68
+ Records: 1,523
69
+ Progress: 1523/1523
70
+ ✅ Anonymized 1,523 record(s)
71
+ ```
72
+
73
+ **Features:**
74
+ - ✅ Focus on one model
75
+ - ✅ Faster than anonymize_all for single model
76
+ - ✅ Shows progress
77
+
78
+ ---
79
+
80
+ ### 3. `rake column_anonymizer:anonymize_where[ModelName,'condition']`
81
+
82
+ Anonymize records **matching a condition**.
83
+
84
+ **Usage:**
85
+ ```bash
86
+ # Anonymize old users
87
+ rake column_anonymizer:anonymize_where[User,'created_at < "2023-01-01"']
88
+
89
+ # Anonymize inactive employees
90
+ rake column_anonymizer:anonymize_where[Employee,'status = "inactive"']
91
+
92
+ # Anonymize by ID range
93
+ rake column_anonymizer:anonymize_where[Patient,'id < 1000']
94
+ ```
95
+
96
+ **Example Output:**
97
+ ```
98
+ 📋 Anonymizing User where created_at < "2023-01-01"
99
+ Columns: email, phone, ssn
100
+ Matching records: 342
101
+ ⚠️ This will anonymize 342 record(s). Continue? (y/N): y
102
+ ✅ Anonymized 342 record(s)
103
+ ```
104
+
105
+ **Features:**
106
+ - ✅ Target specific records
107
+ - ✅ Requires confirmation before processing
108
+ - ✅ Supports any WHERE clause
109
+
110
+ ---
111
+
112
+ ### 4. `rake column_anonymizer:preview`
113
+
114
+ Preview what **would be anonymized** without making changes.
115
+
116
+ **Usage:**
117
+ ```bash
118
+ rake column_anonymizer:preview
119
+ ```
120
+
121
+ **Example Output:**
122
+ ```
123
+ 🔍 Anonymization Preview
124
+ ======================================================================
125
+
126
+ User:
127
+ Columns to anonymize: email, phone, ssn
128
+ Records to process: 1,523
129
+ Types: email, phone, ssn
130
+
131
+ Example (first record):
132
+ email: john.doe@example.com
133
+ phone: +1-555-123-4567
134
+ ssn: 123-45-6789
135
+
136
+ Employee:
137
+ Columns to anonymize: employee_number, ssn
138
+ Records to process: 847
139
+ Types: employee_id, ssn
140
+
141
+ Example (first record):
142
+ employee_number: EMP-2023-12345
143
+ ssn: 987-65-4321
144
+
145
+ ======================================================================
146
+ 💡 Run 'rake column_anonymizer:anonymize_all' to perform anonymization
147
+ ======================================================================
148
+ ```
149
+
150
+ **Features:**
151
+ - ✅ Safe dry-run
152
+ - ✅ Shows what will be processed
153
+ - ✅ Shows example values
154
+ - ✅ No data changes
155
+
156
+ ---
157
+
158
+ ### 5. `rake column_anonymizer:stats`
159
+
160
+ Show **statistics** about your encrypted columns.
161
+
162
+ **Usage:**
163
+ ```bash
164
+ rake column_anonymizer:stats
165
+ ```
166
+
167
+ **Example Output:**
168
+ ```
169
+ 📊 Column Anonymizer Statistics
170
+ ======================================================================
171
+
172
+ Models configured: 3
173
+ Total encrypted columns: 6
174
+
175
+ Detailed breakdown:
176
+
177
+ User:
178
+ Columns: 3 (email, phone, ssn)
179
+ Records: 1,523
180
+ Types: email, phone, ssn
181
+
182
+ Employee:
183
+ Columns: 2 (employee_number, ssn)
184
+ Records: 847
185
+ Types: employee_id, ssn
186
+
187
+ Patient:
188
+ Columns: 1 (medical_record_number)
189
+ Records: 2,104
190
+ Types: mrn
191
+
192
+ ======================================================================
193
+ Total records across all models: 4,474
194
+ ======================================================================
195
+ ```
196
+
197
+ **Features:**
198
+ - ✅ Overview of all models
199
+ - ✅ Column counts
200
+ - ✅ Record counts
201
+ - ✅ Type information
202
+
203
+ ---
204
+
205
+ ## Common Workflows
206
+
207
+ ### Initial Setup and Testing
208
+
209
+ ```bash
210
+ # 1. Preview what will be anonymized
211
+ rake column_anonymizer:preview
212
+
213
+ # 2. Check statistics
214
+ rake column_anonymizer:stats
215
+
216
+ # 3. Test on one model first
217
+ rake column_anonymizer:anonymize_model[User]
218
+
219
+ # 4. If successful, anonymize everything
220
+ rake column_anonymizer:anonymize_all
221
+ ```
222
+
223
+ ### Anonymizing Old Data
224
+
225
+ ```bash
226
+ # Anonymize users older than 2 years
227
+ rake column_anonymizer:anonymize_where[User,'created_at < "2024-01-01"']
228
+
229
+ # Anonymize deleted accounts
230
+ rake column_anonymizer:anonymize_where[User,'deleted_at IS NOT NULL']
231
+ ```
232
+
233
+ ### Production Deployment
234
+
235
+ ```bash
236
+ # 1. Preview on staging
237
+ RAILS_ENV=staging rake column_anonymizer:preview
238
+
239
+ # 2. Check stats
240
+ RAILS_ENV=staging rake column_anonymizer:stats
241
+
242
+ # 3. Anonymize in production (with backup!)
243
+ RAILS_ENV=production rake column_anonymizer:anonymize_all
244
+ ```
245
+
246
+ ### Regular Maintenance
247
+
248
+ ```bash
249
+ # Weekly: Anonymize accounts older than 1 year
250
+ rake column_anonymizer:anonymize_where[User,'last_login_at < "2025-02-01"']
251
+ ```
252
+
253
+ ---
254
+
255
+ ## Advanced Usage
256
+
257
+ ### With Custom Generators
258
+
259
+ If you've registered custom generators:
260
+
261
+ ```ruby
262
+ # config/initializers/column_anonymizer.rb
263
+ ColumnAnonymizer::Anonymizer.register(:credit_card) do
264
+ "XXXX-XXXX-XXXX-#{rand(1000..9999)}"
265
+ end
266
+ ```
267
+
268
+ The Rake tasks will automatically use your custom generators:
269
+
270
+ ```bash
271
+ rake column_anonymizer:anonymize_all
272
+ # Uses your custom credit_card generator
273
+ ```
274
+
275
+ ### Batch Processing Large Datasets
276
+
277
+ For very large tables, consider processing in batches:
278
+
279
+ ```bash
280
+ # Process by ID ranges
281
+ rake column_anonymizer:anonymize_where[User,'id BETWEEN 1 AND 10000']
282
+ rake column_anonymizer:anonymize_where[User,'id BETWEEN 10001 AND 20000']
283
+ # etc.
284
+ ```
285
+
286
+ ### Scheduling with Cron
287
+
288
+ ```bash
289
+ # Add to crontab
290
+ # Anonymize old records weekly on Sunday at 2 AM
291
+ 0 2 * * 0 cd /app && RAILS_ENV=production rake column_anonymizer:anonymize_where[User,'created_at < NOW() - INTERVAL 1 YEAR']
292
+ ```
293
+
294
+ ### Using in Deployment Scripts
295
+
296
+ ```ruby
297
+ # config/deploy.rb (Capistrano)
298
+ after 'deploy:migrate', 'column_anonymizer:anonymize_old_data'
299
+
300
+ namespace :column_anonymizer do
301
+ task :anonymize_old_data do
302
+ on roles(:db) do
303
+ within release_path do
304
+ with rails_env: fetch(:rails_env) do
305
+ execute :rake, 'column_anonymizer:anonymize_where[User,\'created_at < "2023-01-01"\']'
306
+ end
307
+ end
308
+ end
309
+ end
310
+ end
311
+ ```
312
+
313
+ ---
314
+
315
+ ## Error Handling
316
+
317
+ The tasks handle errors gracefully:
318
+
319
+ ```
320
+ 📋 Processing User...
321
+ Columns: email, phone, ssn
322
+ Records: 1,523
323
+ ❌ Error anonymizing User ID 42: Validation failed
324
+ ❌ Error anonymizing User ID 105: undefined method 'email='
325
+ ✅ Anonymized 1,521 record(s)
326
+ ⚠️ 2 error(s)
327
+ ```
328
+
329
+ **Common errors:**
330
+ - Model class not found → Check model name spelling
331
+ - Validation errors → Check model validations
332
+ - Missing column → Check YAML config matches model
333
+
334
+ ---
335
+
336
+ ## Performance Tips
337
+
338
+ ### For Large Tables
339
+
340
+ 1. **Use batching** - `find_each` processes in batches of 1000
341
+ 2. **Process during low traffic** - Run during off-peak hours
342
+ 3. **Use conditions** - Process subsets with `anonymize_where`
343
+ 4. **Monitor progress** - Watch the progress indicators
344
+ 5. **Check database locks** - Ensure no long-running queries
345
+
346
+ ### Optimization
347
+
348
+ ```bash
349
+ # Process specific date ranges
350
+ rake column_anonymizer:anonymize_where[User,'created_at BETWEEN "2020-01-01" AND "2020-12-31"']
351
+
352
+ # Use indexes
353
+ # Add indexes on commonly queried columns (created_at, status, etc.)
354
+ ```
355
+
356
+ ---
357
+
358
+ ## Safety Checklist
359
+
360
+ Before running in production:
361
+
362
+ - ✅ **Backup your database** - Always have a recent backup
363
+ - ✅ **Test on staging** - Run tasks on staging first
364
+ - ✅ **Preview first** - Use `preview` task to see what will change
365
+ - ✅ **Check stats** - Use `stats` task to understand scope
366
+ - ✅ **Test one model** - Use `anonymize_model` on one model first
367
+ - ✅ **Verify custom generators** - Ensure custom generators work correctly
368
+ - ✅ **Check validations** - Ensure anonymized data passes validations
369
+ - ✅ **Monitor performance** - Watch database and app performance
370
+ - ✅ **Have rollback plan** - Know how to restore from backup
371
+
372
+ ---
373
+
374
+ ## Integration Examples
375
+
376
+ ### With Heroku Scheduler
377
+
378
+ ```bash
379
+ # Add to Heroku Scheduler (daily at midnight)
380
+ rake column_anonymizer:anonymize_where[User,'deleted_at < NOW() - INTERVAL 30 DAY']
381
+ ```
382
+
383
+ ### With Sidekiq/Background Jobs
384
+
385
+ ```ruby
386
+ # app/jobs/anonymize_old_users_job.rb
387
+ class AnonymizeOldUsersJob < ApplicationJob
388
+ queue_as :default
389
+
390
+ def perform
391
+ system("rake column_anonymizer:anonymize_where[User,'created_at < \"#{1.year.ago.to_date}\"']")
392
+ end
393
+ end
394
+ ```
395
+
396
+ ### With Rails Runner
397
+
398
+ ```bash
399
+ # One-off command
400
+ rails runner "
401
+ schema = ColumnAnonymizer::SchemaLoader.load_schema
402
+ User.where('created_at < ?', 2.years.ago).find_each do |user|
403
+ ColumnAnonymizer::Anonymizer.anonymize_model!(user)
404
+ end
405
+ "
406
+ ```
407
+
408
+ ---
409
+
410
+ ## Troubleshooting
411
+
412
+ ### Task Not Found
413
+
414
+ ```bash
415
+ # Error: Don't know how to build task 'column_anonymizer:anonymize_all'
416
+
417
+ # Solution: Ensure rake file is loaded
418
+ # For gems, add to gemspec:
419
+ spec.files = Dir['lib/**/*.{rb,rake}']
420
+
421
+ # Or manually load:
422
+ rake -T column_anonymizer
423
+ ```
424
+
425
+ ### Model Not Found
426
+
427
+ ```bash
428
+ # Error: Model class 'User' not found
429
+
430
+ # Solution: Check model name in encrypted_columns.yml
431
+ # Must match exact class name (case-sensitive)
432
+ ```
433
+
434
+ ### No Records Anonymized
435
+
436
+ ```bash
437
+ # Check:
438
+ 1. Models have encrypted columns defined
439
+ 2. YAML config matches model structure
440
+ 3. Records actually exist
441
+ 4. Models respond to encrypted_columns_metadata
442
+ ```
443
+
444
+ ---
445
+
446
+ ## Quick Reference
447
+
448
+ | Task | Purpose | Example |
449
+ |------|---------|---------|
450
+ | `anonymize_all` | Anonymize all models | `rake column_anonymizer:anonymize_all` |
451
+ | `anonymize_model[Model]` | Anonymize one model | `rake column_anonymizer:anonymize_model[User]` |
452
+ | `anonymize_where[Model,'condition']` | Conditional anonymization | `rake column_anonymizer:anonymize_where[User,'id < 1000']` |
453
+ | `preview` | Dry run | `rake column_anonymizer:preview` |
454
+ | `stats` | Show statistics | `rake column_anonymizer:stats` |
455
+
456
+ ---
457
+
458
+ ## Summary
459
+
460
+ The Rake tasks provide:
461
+
462
+ ✅ **Bulk anonymization** - Process all models at once
463
+ ✅ **Selective anonymization** - Target specific models or records
464
+ ✅ **Safe preview** - Check before making changes
465
+ ✅ **Progress tracking** - Monitor long-running operations
466
+ ✅ **Error handling** - Graceful error recovery
467
+ ✅ **Statistics** - Understand your data
468
+
469
+ Start with `preview` and `stats`, then use `anonymize_all` or `anonymize_model` as needed! 🚀