column_anonymizer 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,389 @@
1
+ # Column Anonymizer
2
+
3
+ A Rails gem for intelligently anonymizing encrypted data with YAML-based configuration.
4
+
5
+ ## Features
6
+
7
+ - 🔒 **Seamless Rails Integration**: Works with Rails 7+ Active Record Encryption
8
+ - 📝 **YAML-Based Configuration**: Centralized, version-controlled column type definitions
9
+ - 🔍 **Automatic Model Scanner**: Discovers encrypted columns automatically
10
+ - 🧠 **Intelligent Type Guessing**: Smart detection of data types from column names
11
+ - 🎨 **Built-in Generators**: Pre-configured anonymizers for common data types
12
+ - 🔄 **Safe Merging**: Updates config without overwriting existing entries
13
+
14
+ ## Installation
15
+
16
+ Add this line to your application's Gemfile:
17
+
18
+ ```ruby
19
+ gem 'column_anonymizer'
20
+ ```
21
+
22
+ And then execute:
23
+
24
+ ```bash
25
+ bundle install
26
+ ```
27
+
28
+ Or install from source:
29
+
30
+ ```bash
31
+ cd column_anonymizer
32
+ gem build column_anonymizer.gemspec
33
+ gem install column_anonymizer-0.1.0.gem
34
+ ```
35
+
36
+ ## Quick Start
37
+
38
+ ### 1. Install Configuration File
39
+
40
+ ```bash
41
+ rails generate column_anonymizer:install
42
+ ```
43
+
44
+ This creates `config/encrypted_columns.yml`.
45
+
46
+ ### 2. Scan Your Models (Recommended)
47
+
48
+ Automatically discover encrypted columns:
49
+
50
+ ```bash
51
+ rails generate column_anonymizer:scan
52
+ ```
53
+
54
+ Or install and scan in one step:
55
+
56
+ ```bash
57
+ rails generate column_anonymizer:install --scan
58
+ ```
59
+
60
+ ### 3. Use Standard Rails Encryption
61
+
62
+ ```ruby
63
+ class User < ApplicationRecord
64
+ encrypts :email
65
+ encrypts :phone_number
66
+ encrypts :ssn
67
+ end
68
+ ```
69
+
70
+ ### 4. Anonymize Data
71
+
72
+ ```ruby
73
+ user = User.first
74
+ ColumnAnonymizer::Anonymizer.anonymize_model!(user)
75
+
76
+ # email becomes: user_a1b2c3d4@example.com
77
+ # phone_number becomes: +15551234567
78
+ # ssn becomes: 123-45-6789
79
+ ```
80
+
81
+ ## Automatic Model Scanning
82
+
83
+ The scan generator intelligently discovers all encrypted columns in your models:
84
+
85
+ ```bash
86
+ rails generate column_anonymizer:scan
87
+ ```
88
+
89
+ **Output:**
90
+ ```
91
+ 🔍 Scanning models for encrypted attributes...
92
+ ➕ Adding User.email as 'email'
93
+ ➕ Adding User.phone as 'phone'
94
+ ➕ Adding User.ssn as 'ssn'
95
+ ✅ Scanned 1 model(s) with encrypted attributes
96
+ 📝 Appended 3 new column(s) to config/encrypted_columns.yml
97
+ User: email, phone, ssn
98
+ ```
99
+
100
+ ### Append-Only Updates đŸŽ¯
101
+
102
+ The scanner **preserves your existing file structure**:
103
+ - ✅ Comments and custom formatting maintained
104
+ - ✅ Only new entries added (minimal git diffs)
105
+ - ✅ Existing configuration never modified
106
+ - ✅ New columns inserted under their model
107
+ - ✅ New models appended at the end
108
+
109
+ Perfect for team environments where the YAML file has custom organization!
110
+
111
+ ### Intelligent Type Guessing
112
+
113
+ The scanner automatically detects appropriate anonymization types:
114
+
115
+ | Column Name Pattern | Detected Type | Example Output |
116
+ |---------------------|---------------|----------------|
117
+ | `email` | `:email` | `user_a1b2c3d4@example.com` |
118
+ | `phone`, `mobile`, `cell` | `:phone` | `+15551234567` |
119
+ | `ssn`, `social_security` | `:ssn` | `123-45-6789` |
120
+ | `first_name`, `fname` | `:first_name` | `John`, `Jane` |
121
+ | `last_name`, `surname` | `:last_name` | `Smith`, `Johnson` |
122
+ | `name`, `full_name` | `:name` | `Anonymous User abc123` |
123
+ | `address`, `street` | `:address` | `1234 Anonymous St` |
124
+ | `password`, `token` | `:text` | `Anonymized text abc123` |
125
+
126
+ ### Safe Configuration Updates
127
+
128
+ Re-running the scanner **only adds new entries** without modifying existing ones:
129
+
130
+ ```bash
131
+ rails generate column_anonymizer:scan
132
+ ```
133
+
134
+ **Output:**
135
+ ```
136
+ 🔍 Scanning models for encrypted attributes...
137
+ â„šī¸ Skipping User.email (already configured as 'email')
138
+ â„šī¸ Skipping User.phone (already configured as 'phone')
139
+ ➕ Adding User.date_of_birth as 'text'
140
+ ```
141
+
142
+ **What gets preserved:**
143
+ - ✅ Your comments in the YAML file
144
+ - ✅ Custom formatting and indentation
145
+ - ✅ Order of existing entries
146
+ - ✅ Manual type overrides
147
+
148
+ The scanner intelligently:
149
+ - Inserts new columns under their existing model
150
+ - Appends new models at the end of the file
151
+ - Never regenerates or reformats existing content
152
+
153
+ ## Manual Configuration
154
+
155
+ You can also manually edit `config/encrypted_columns.yml`:
156
+
157
+ ```yaml
158
+ User:
159
+ email: email
160
+ phone_number: phone
161
+ ssn: ssn
162
+ first_name: first_name
163
+ last_name: last_name
164
+ date_of_birth: text
165
+
166
+ Patient:
167
+ medical_record_number: text
168
+ emergency_contact_phone: phone
169
+
170
+ CreditCard:
171
+ card_number: text
172
+ cvv: text
173
+ cardholder_name: name
174
+ ```
175
+
176
+ ## Built-in Anonymization Types
177
+
178
+ | Type | Example Output | Use Case |
179
+ |------|----------------|----------|
180
+ | `:email` | `user_a1b2c3d4@example.com` | Email addresses |
181
+ | `:phone` | `+15551234567` | Phone numbers |
182
+ | `:ssn` | `123-45-6789` | Social Security Numbers |
183
+ | `:name` | `Anonymous User abc123` | Full names |
184
+ | `:first_name` | `John`, `Jane`, `Alex` | First names |
185
+ | `:last_name` | `Smith`, `Johnson`, `Williams` | Last names |
186
+ | `:address` | `1234 Anonymous St, City, ST 12345` | Full addresses |
187
+ | `:text` | `Anonymized text a1b2c3d4` | Generic text |
188
+
189
+ ## Custom Anonymization Types 🎨
190
+
191
+ Define your own custom anonymization generators for domain-specific data:
192
+
193
+ ```bash
194
+ # Generate initializer for custom generators
195
+ rails generate column_anonymizer:initializer
196
+ ```
197
+
198
+ Then register your custom types:
199
+
200
+ ```ruby
201
+ # config/initializers/column_anonymizer.rb
202
+
203
+ ColumnAnonymizer::Anonymizer.register(:credit_card) do
204
+ "XXXX-XXXX-XXXX-#{rand(1000..9999)}"
205
+ end
206
+
207
+ ColumnAnonymizer::Anonymizer.register(:employee_id) do
208
+ "EMP-#{Time.now.year}-#{rand(10000..99999)}"
209
+ end
210
+ ```
211
+
212
+ Use them in your config:
213
+
214
+ ```yaml
215
+ # config/encrypted_columns.yml
216
+ User:
217
+ credit_card_number: credit_card # Custom type!
218
+ employee_number: employee_id # Custom type!
219
+ ```
220
+
221
+ **See [CUSTOM_GENERATORS_GUIDE.md](CUSTOM_GENERATORS_GUIDE.md) for complete documentation and examples.**
222
+
223
+ ## Usage Examples
224
+
225
+ ### Anonymize a Single Model
226
+
227
+ ```ruby
228
+ user = User.find(123)
229
+ ColumnAnonymizer::Anonymizer.anonymize_model!(user)
230
+ user.reload
231
+
232
+ puts user.email # => "user_abc12345@example.com"
233
+ puts user.phone # => "+15551234567"
234
+ puts user.ssn # => "123-45-6789"
235
+ ```
236
+
237
+ ### Anonymize All Records
238
+
239
+ ```ruby
240
+ User.find_each do |user|
241
+ ColumnAnonymizer::Anonymizer.anonymize_model!(user)
242
+ end
243
+ ```
244
+
245
+ ### Anonymize Specific Records
246
+
247
+ ```ruby
248
+ # Anonymize users from a specific date range
249
+ User.where("created_at < ?", 1.year.ago).find_each do |user|
250
+ ColumnAnonymizer::Anonymizer.anonymize_model!(user)
251
+ end
252
+ ```
253
+
254
+ ### Bulk Anonymization with Rake Tasks 🚀
255
+
256
+ The gem includes powerful Rake tasks for bulk anonymization:
257
+
258
+ ```bash
259
+ # Anonymize all models and all records
260
+ rake column_anonymizer:anonymize_all
261
+
262
+ # Anonymize a specific model
263
+ rake column_anonymizer:anonymize_model[User]
264
+
265
+ # Anonymize records matching a condition
266
+ rake column_anonymizer:anonymize_where[User,'created_at < "2023-01-01"']
267
+
268
+ # Preview what will be anonymized (dry run)
269
+ rake column_anonymizer:preview
270
+
271
+ # Show statistics
272
+ rake column_anonymizer:stats
273
+ ```
274
+
275
+ **Example Output:**
276
+ ```
277
+ 📋 Processing User...
278
+ Columns: email, phone, ssn
279
+ Records: 1,523
280
+ ✅ Anonymized 1,523 record(s)
281
+
282
+ 🎉 Anonymization complete!
283
+ Total records anonymized: 1,523
284
+ ```
285
+
286
+ **See [RAKE_TASKS_GUIDE.md](RAKE_TASKS_GUIDE.md) for complete documentation.**
287
+
288
+ ### Custom Rake Task
289
+
290
+ You can also create your own tasks:
291
+
292
+ ```ruby
293
+ # lib/tasks/anonymize.rake
294
+ namespace :data do
295
+ desc "Anonymize old user data"
296
+ task anonymize_old_users: :environment do
297
+ count = 0
298
+ User.where("created_at < ?", 2.years.ago).find_each do |user|
299
+ ColumnAnonymizer::Anonymizer.anonymize_model!(user)
300
+ count += 1
301
+ end
302
+ puts "Anonymized #{count} users"
303
+ end
304
+ end
305
+ ```
306
+
307
+ Then run:
308
+
309
+ ```bash
310
+ rails data:anonymize_old_users
311
+ ```
312
+
313
+ ## Development & Testing
314
+
315
+ ### Reload Schema in Console
316
+
317
+ ```ruby
318
+ # In Rails console
319
+ ColumnAnonymizer::SchemaLoader.reload_schema!
320
+ User.reload_encrypted_columns_metadata!
321
+ ```
322
+
323
+ ### Check Current Configuration
324
+
325
+ ```ruby
326
+ # View metadata for a model
327
+ User.encrypted_columns_metadata
328
+ # => {:email=>:email, :phone=>:phone, :ssn=>:ssn}
329
+
330
+ # View all loaded schema
331
+ ColumnAnonymizer::SchemaLoader.load_schema
332
+ # => {"User"=>{"email"=>"email", "phone"=>"phone", "ssn"=>"ssn"}}
333
+ ```
334
+
335
+ ## Generator Commands
336
+
337
+ | Command | Description |
338
+ |---------|-------------|
339
+ | `rails generate column_anonymizer:install` | Create config file |
340
+ | `rails generate column_anonymizer:install --scan` | Install and scan models |
341
+ | `rails generate column_anonymizer:scan` | Scan models and update config |
342
+ | `rails generate column_anonymizer:initializer` | Create initializer for custom generators |
343
+
344
+ ## Configuration File
345
+
346
+ The gem expects a YAML file at `config/encrypted_columns.yml`:
347
+
348
+ ```yaml
349
+ # Format: ModelName -> column_name -> anonymization_type
350
+ User:
351
+ email: email
352
+ phone: phone
353
+ ssn: ssn
354
+
355
+ Patient:
356
+ medical_record_number: text
357
+ emergency_contact_phone: phone
358
+ ```
359
+
360
+ ## Why YAML Configuration?
361
+
362
+ ✅ **Centralized**: All column types in one file
363
+ ✅ **Version Controlled**: Track changes in git
364
+ ✅ **No Code Changes**: Use standard Rails `encrypts`
365
+ ✅ **Easy Updates**: Modify types without touching models
366
+ ✅ **Automatic Discovery**: Scan feature populates config
367
+ ✅ **Team Friendly**: Clear overview of all encrypted data
368
+
369
+ ## Requirements
370
+
371
+ - Ruby 2.7+
372
+ - Rails 7.0+
373
+ - Active Record Encryption enabled
374
+
375
+ ## License
376
+
377
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
378
+
379
+ ## Contributing
380
+
381
+ 1. Fork it
382
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
383
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
384
+ 4. Push to the branch (`git push origin my-new-feature`)
385
+ 5. Create new Pull Request
386
+
387
+ ## Support
388
+
389
+ For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/hunter-kendall/column_anonymizer).
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ require "rubocop/rake_task"
9
+
10
+ RuboCop::RakeTask.new
11
+
12
+ task default: %i[spec rubocop]
@@ -0,0 +1,141 @@
1
+ # Test Script for Scan Generator
2
+
3
+ This document describes how to test the scan generator.
4
+
5
+ ## Setup Test Rails App
6
+
7
+ ```bash
8
+ # Create a test Rails app
9
+ rails new test_app --skip-bundle
10
+ cd test_app
11
+
12
+ # Add the gem to Gemfile
13
+ echo "gem 'column_anonymizer', path: '/Users/hkend/Documents/column_anonymizer'" >> Gemfile
14
+ bundle install
15
+
16
+ # Install the gem
17
+ rails generate column_anonymizer:install
18
+ ```
19
+
20
+ ## Create Test Models
21
+
22
+ ```bash
23
+ # Create some test models with encrypted attributes
24
+ rails generate model User email:string phone:string ssn:string
25
+ rails generate model Patient medical_record_number:string emergency_contact_phone:string
26
+
27
+ # Add encrypts calls to models
28
+ ```
29
+
30
+ Edit `app/models/user.rb`:
31
+ ```ruby
32
+ class User < ApplicationRecord
33
+ encrypts :email
34
+ encrypts :phone
35
+ encrypts :ssn
36
+ end
37
+ ```
38
+
39
+ Edit `app/models/patient.rb`:
40
+ ```ruby
41
+ class Patient < ApplicationRecord
42
+ encrypts :medical_record_number
43
+ encrypts :emergency_contact_phone
44
+ end
45
+ ```
46
+
47
+ ## Test the Scanner
48
+
49
+ ```bash
50
+ # Run the scan generator
51
+ rails generate column_anonymizer:scan
52
+ ```
53
+
54
+ Expected output:
55
+ ```
56
+ 🔍 Scanning models for encrypted attributes...
57
+ ➕ Adding User.email as 'email'
58
+ ➕ Adding User.phone as 'phone'
59
+ ➕ Adding User.ssn as 'ssn'
60
+ ➕ Adding Patient.medical_record_number as 'text'
61
+ ➕ Adding Patient.emergency_contact_phone as 'phone'
62
+ ✅ Scanned 2 model(s) with encrypted attributes
63
+ 📝 Updated config/encrypted_columns.yml
64
+ User: email, phone, ssn
65
+ Patient: medical_record_number, emergency_contact_phone
66
+ ```
67
+
68
+ ## Verify Config File
69
+
70
+ ```bash
71
+ cat config/encrypted_columns.yml
72
+ ```
73
+
74
+ Expected content:
75
+ ```yaml
76
+ ---
77
+ User:
78
+ email: email
79
+ phone: phone
80
+ ssn: ssn
81
+ Patient:
82
+ medical_record_number: text
83
+ emergency_contact_phone: phone
84
+ ```
85
+
86
+ ## Test Re-running Scanner
87
+
88
+ ```bash
89
+ # Run again to verify it doesn't overwrite existing entries
90
+ rails generate column_anonymizer:scan
91
+ ```
92
+
93
+ Expected output:
94
+ ```
95
+ 🔍 Scanning models for encrypted attributes...
96
+ â„šī¸ Skipping User.email (already configured as 'email')
97
+ â„šī¸ Skipping User.phone (already configured as 'phone')
98
+ â„šī¸ Skipping User.ssn (already configured as 'ssn')
99
+ â„šī¸ Skipping Patient.medical_record_number (already configured as 'text')
100
+ â„šī¸ Skipping Patient.emergency_contact_phone (already configured as 'phone')
101
+ ✅ Scanned 2 model(s) with encrypted attributes
102
+ 📝 Updated config/encrypted_columns.yml
103
+ User: email, phone, ssn
104
+ Patient: medical_record_number, emergency_contact_phone
105
+ ```
106
+
107
+ ## Test Install with Scan
108
+
109
+ ```bash
110
+ # Remove config file
111
+ rm config/encrypted_columns.yml
112
+
113
+ # Install and scan in one step
114
+ rails generate column_anonymizer:install --scan
115
+ ```
116
+
117
+ ## Test Patterns
118
+
119
+ The scanner should detect these patterns correctly:
120
+
121
+ | Model Attribute | Expected Type |
122
+ |----------------|---------------|
123
+ | `email` | `email` |
124
+ | `phone`, `mobile_phone`, `cell_phone` | `phone` |
125
+ | `ssn`, `social_security_number` | `ssn` |
126
+ | `first_name` | `first_name` |
127
+ | `last_name`, `surname` | `last_name` |
128
+ | `full_name`, `name` | `name` |
129
+ | `address`, `street_address` | `address` |
130
+ | `credit_card_number` | `text` |
131
+ | `password_digest` | `text` |
132
+ | `api_token` | `text` |
133
+
134
+ ## Success Criteria
135
+
136
+ - ✅ Scanner finds all models with `encrypts` calls
137
+ - ✅ Type guessing works correctly for common column names
138
+ - ✅ Existing config entries are preserved (not overwritten)
139
+ - ✅ Multiple attributes in single `encrypts` call are detected
140
+ - ✅ Config file has valid YAML format
141
+ - ✅ Install with `--scan` works in one step