grainery 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ab9b7d46f3a3716cf0f043060f4f267a9be926ffae12cdb477dd85df016529e3
4
- data.tar.gz: 44edc342af67c5fce8f77559830eada34772b18f5ac226c76886d1fe65a55fa0
3
+ metadata.gz: 24d14592917d3f6680818f8096acd7f68f63a7e47bbb26aadbc19084ee106845
4
+ data.tar.gz: 9eb6671fcd81703bc1a6c450a84ef58213b396c0e3eb731a116dba7d5c22eec6
5
5
  SHA512:
6
- metadata.gz: 5eea1be971b0ee1b02619377c1b2ec44fc68e8ea302653164a754d04493d31bbe914906391061ca2eaa773794890f6b4a49239c15aefbdc0c8386daa20bef518
7
- data.tar.gz: 40258587446259c0e5770b61b1d7282fb7750d64b14aed6983cee287338e4d65338a7b363455c292f1fe56cfcc7f9dd93e71af12f1618a506c7dec71ef5fa35e
6
+ metadata.gz: d934e0497f34309c019774b752ba83e83e6f083ab61961ebf7f16235290492c1674a7209fda111c56d9044464f598be7f99ac35b6ba2ef5074dfe95b5b62be1a
7
+ data.tar.gz: 321aaf5825073cf57e747ceda98ae4d4ba1034e180b60a5583474422db80d13fe0507fda4df8b3d71d67b71dc58e25952d22c3abb9e6dba8e7cff8519aa34f21
data/CHANGELOG.md CHANGED
@@ -5,6 +5,69 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.2.0] - 2025-10-01
9
+
10
+ ### Added
11
+ - Database schema dumping functionality for all related databases
12
+ - Schema files now generated in each database directory (e.g., `db/grainery/primary/schema.rb`)
13
+ - New rake task: `grainery:load_with_schema` - Load schemas and seeds together
14
+ - New rake task: `grainery:generate_data_only` - Harvest data without schema dump
15
+ - New rake task: `grainery:generate_raw` - Harvest data without anonymization
16
+ - Schema loading option in `load_seeds` method with `load_schema` parameter
17
+ - **Production environment protection**: Destructive tasks now blocked in production by default
18
+ - Affects: `grainery:load`, `grainery:load_with_schema`, and all `test:db:*` tasks
19
+ - Override with `GRAINERY_ALLOW_PRODUCTION=true` environment variable
20
+ - Includes 5-second safety countdown when override is used
21
+ - **Data Anonymization**: Automatic anonymization of sensitive fields using Faker gem
22
+ - **Automatic field detection**: `grainery:init_config` scans database schema and auto-detects anonymizable fields
23
+ - **Scoped anonymization**: Support for table-specific and database-specific field configuration using `table.field` or `database.table.field` notation
24
+ - Automatic scoping when duplicate field names are detected across multiple tables
25
+ - Default anonymization for common fields (email, name, phone, address, SSN, credit cards, passwords, tokens, API keys)
26
+ - Greek-specific document anonymization:
27
+ - `greek_vat` - Greek VAT number (AFM - 9 digits)
28
+ - `greek_amka` - Greek Social Security Number (11 digits: DDMMYY + 5 digits)
29
+ - `greek_personal_number` - Greek Personal Number (12 characters: 2 digits + letter + 9-digit AFM)
30
+ - `greek_ada` - Greek ADA/Diavgeia Decision Number (15 characters: 4 Greek letters + 2 digits + 4 Greek letters + dash + 1 digit + 2 Greek letters, e.g., "ΨΜΦΡ69ΟΤΝΡ-9ΤΟ")
31
+ - `greek_adam` - Greek ADAM/Public Procurement Publicity identifier (14-15 characters: 2 digits + PROC or REQ + 9 digits, e.g., "24REQ187755230" or "23PROC456789012")
32
+ - `iban` - Greek IBAN (27 characters)
33
+ - `date_of_birth` - Anonymizes birth dates while preserving approximate age (±2 years, minimum age 18 to preserve adulthood)
34
+ - Selective anonymization: Use `skip` value to preserve real data for specific non-sensitive fields
35
+ - All fake values automatically respect database column size limits
36
+ - Configurable anonymization via `anonymize_fields` in `config/grainery.yml`
37
+
38
+ ### Changed
39
+ - `grainery:generate` and `grainery:generate_all` now dump database schemas by default
40
+ - All generation tasks (`generate`, `generate_all`, `generate_data_only`) now anonymize data by default
41
+ - `grainery:load` continues to load only seed data (schemas optional)
42
+ - Schema dump includes table definitions, columns with attributes, and indexes
43
+ - Updated test framework from RSpec to Minitest
44
+ - Rails dependency updated to support versions 6.1 through 8.x
45
+ - Ruby requirement updated to >= 3.2.0
46
+ - Anonymization happens during harvest, not during load
47
+ - Generated seed files contain anonymized data safe for version control
48
+
49
+ ### Security
50
+ - Added production environment safeguards to prevent accidental data loss
51
+ - Destructive operations require explicit opt-in via environment variable in production
52
+ - Sensitive data is now anonymized by default during harvest
53
+ - Safe to commit anonymized seed files to version control
54
+ - Lookup tables are not anonymized (reference data)
55
+
56
+ ### Technical Details
57
+ - Schema dumps use `ActiveRecord::Schema.define` format
58
+ - Each database connection gets its own schema file
59
+ - Schema loading occurs before seed data when enabled
60
+ - Automatic detection and skip of internal Rails tables (schema_migrations, ar_internal_metadata)
61
+ - Anonymization uses Faker gem for realistic fake data
62
+ - Automatic field detection uses pattern matching on column names (email, phone, ssn, afm, amka, ada, adam, etc.)
63
+ - Detected anonymizable fields are automatically added to `config/grainery.yml` during initialization
64
+ - Scoped field resolution with priority: `database.table.field` > `table.field` > `field`
65
+ - When duplicate field names detected, automatically uses scoped configuration
66
+ - Type-aware anonymization respects column data types and size limits
67
+ - String fields automatically truncated to match column maximum length
68
+ - Numeric fields maintain their data type
69
+ - Date of birth anonymization preserves age categories while protecting actual birth dates
70
+
8
71
  ## [0.1.0] - 2025-10-01
9
72
 
10
73
  ### Added
@@ -30,4 +93,5 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
30
93
  - `test:db:clean` - Truncate all test tables
31
94
  - `test:db:stats` - Show test database statistics
32
95
 
96
+ [0.2.0]: https://github.com/mpantel/grainery/releases/tag/v0.2.0
33
97
  [0.1.0]: https://github.com/mpantel/grainery/releases/tag/v0.1.0
data/README.md CHANGED
@@ -2,17 +2,23 @@
2
2
 
3
3
  Database seed storage system for Rails applications. Extract database records and generate seed files organized by database with automatic dependency resolution. Like a grainery stores grain, this gem stores and organizes your database seeds.
4
4
 
5
+ > **Note:** This gem was developed with assistance from [Claude](https://claude.ai), Anthropic's AI assistant. Claude helped with code generation, documentation, and testing strategies throughout the development process.
6
+
7
+ > **⚠️ Development Status:** This gem is in active development and does not yet have a comprehensive test suite. While the core functionality has been tested manually, automated tests are planned for future releases. Use with caution in production environments.
8
+
5
9
  ## Features
6
10
 
7
11
  - ✅ Automatic database detection
8
12
  - ✅ Dependency-aware loading (topological sort)
9
13
  - ✅ Multi-database support
14
+ - ✅ Database schema dumping for all related databases
10
15
  - ✅ Configurable per project
11
16
  - ✅ Preserves custom seeds
12
17
  - ✅ One seed file per table
13
18
  - ✅ Clean separation of concerns
14
19
  - ✅ Supports SQL Server, MySQL, PostgreSQL
15
20
  - ✅ Test database management tasks
21
+ - ✅ Rails 6.1 - 8.x support
16
22
 
17
23
  ## Installation
18
24
 
@@ -36,27 +42,48 @@ bundle install
36
42
  rake grainery:init_config
37
43
  ```
38
44
 
39
- This auto-detects all databases and creates `config/grainery.yml`.
45
+ This auto-detects:
46
+ - All databases and model base classes
47
+ - Anonymizable fields in your database schema (email, phone, SSN, Greek documents, etc.)
48
+ - Creates `config/grainery.yml` with detected configuration
40
49
 
41
50
  ### 2. Harvest Data
42
51
 
43
52
  ```bash
44
- # Harvest with limit (100 records per table)
53
+ # Harvest with limit (100 records per table) + schema dump + anonymization
45
54
  rake grainery:generate
46
55
 
47
- # Harvest ALL records (use with caution)
56
+ # Harvest ALL records + schema dump + anonymization (use with caution)
48
57
  rake grainery:generate_all
58
+
59
+ # Harvest data only (no schema dump) + anonymization
60
+ rake grainery:generate_data_only
61
+
62
+ # Harvest without anonymization (raw production data - use with extreme caution!)
63
+ rake grainery:generate_raw
49
64
  ```
50
65
 
66
+ **Note:** By default, sensitive fields are anonymized using Faker. Configure anonymization in `config/grainery.yml`.
67
+
51
68
  ### 3. Load Seeds
52
69
 
53
70
  ```bash
71
+ # Load seeds only (blocked in production)
54
72
  rake grainery:load
73
+
74
+ # Load schemas + seeds (blocked in production)
75
+ rake grainery:load_with_schema
76
+
77
+ # Override production protection (use with extreme caution!)
78
+ GRAINERY_ALLOW_PRODUCTION=true rake grainery:load
55
79
  ```
56
80
 
81
+ **Note:** Loading tasks are blocked in production by default to prevent accidental data loss.
82
+
57
83
  This loads:
58
- 1. Harvested seeds (in dependency order)
59
- 2. Custom seeds from `db/seeds.rb` (last)
84
+ 1. Database schemas (if using `load_with_schema`)
85
+ 2. Harvested seeds (in dependency order)
86
+ 3. Custom seeds from `db/seeds.rb` (last)
60
87
 
61
88
  ## Directory Structure
62
89
 
@@ -65,12 +92,15 @@ db/
65
92
  ├── grainery/ # Harvested seeds (auto-generated)
66
93
  │ ├── load_order.txt # Load order respecting dependencies
67
94
  │ ├── primary/ # Primary database
95
+ │ │ ├── schema.rb # Database schema dump
68
96
  │ │ ├── users.rb
69
97
  │ │ ├── posts.rb
70
98
  │ │ └── comments.rb
71
99
  │ ├── other/ # Other database
100
+ │ │ ├── schema.rb # Database schema dump
72
101
  │ │ └── projects.rb
73
102
  │ └── banking/ # Banking database
103
+ │ ├── schema.rb # Database schema dump
74
104
  │ └── employees.rb
75
105
  └── seeds.rb # Custom seeds (loaded last)
76
106
  ```
@@ -97,6 +127,51 @@ database_connections:
97
127
 
98
128
  # Lookup tables (harvest all records)
99
129
  lookup_tables: []
130
+
131
+ # Field anonymization (column_name => faker_method)
132
+ # Set to empty hash {} to disable anonymization
133
+ anonymize_fields:
134
+ email: email
135
+ first_name: first_name
136
+ last_name: last_name
137
+ name: name
138
+ phone: phone_number
139
+ phone_number: phone_number
140
+ address: address
141
+ street_address: street_address
142
+ city: city
143
+ state: state
144
+ zip: zip_code
145
+ zip_code: zip_code
146
+ postal_code: zip_code
147
+ ssn: ssn
148
+ credit_card: credit_card_number
149
+ password: password
150
+ token: token
151
+ api_key: api_key
152
+ secret: secret
153
+ iban: iban
154
+ vat_number: greek_vat
155
+ afm: greek_vat
156
+ amka: greek_amka
157
+ social_security_number: greek_amka
158
+ ssn_greek: greek_amka
159
+ personal_number: greek_personal_number
160
+ personal_id: greek_personal_number
161
+ afm_extended: greek_personal_number
162
+ ada: greek_ada
163
+ diavgeia_id: greek_ada
164
+ decision_number: greek_ada
165
+ adam: greek_adam
166
+ adam_number: greek_adam
167
+ procurement_id: greek_adam
168
+ date_of_birth: date_of_birth
169
+ birth_date: date_of_birth
170
+ dob: date_of_birth
171
+ birthdate: date_of_birth
172
+ identity_number: identity_number
173
+ id_number: identity_number
174
+ national_id: identity_number
100
175
  ```
101
176
 
102
177
  ## Available Rake Tasks
@@ -107,15 +182,24 @@ lookup_tables: []
107
182
  # Initialize configuration
108
183
  rake grainery:init_config
109
184
 
110
- # Harvest data (with limit)
185
+ # Harvest data (with limit) + schema dump + anonymization
111
186
  rake grainery:generate
112
187
 
113
- # Harvest ALL records
188
+ # Harvest ALL records + schema dump + anonymization
114
189
  rake grainery:generate_all
115
190
 
191
+ # Harvest data only (no schema dump) + anonymization
192
+ rake grainery:generate_data_only
193
+
194
+ # Harvest without anonymization (raw production data)
195
+ rake grainery:generate_raw
196
+
116
197
  # Load harvested + custom seeds
117
198
  rake grainery:load
118
199
 
200
+ # Load schemas + seeds + custom seeds
201
+ rake grainery:load_with_schema
202
+
119
203
  # Clean grainery directory
120
204
  rake grainery:clean
121
205
  ```
@@ -177,7 +261,33 @@ lookup_tables:
177
261
  - categories
178
262
  ```
179
263
 
180
- ## Seed File Format
264
+ ## File Formats
265
+
266
+ ### Schema File Format
267
+
268
+ Each database gets a schema dump:
269
+
270
+ ```ruby
271
+ # Schema dump for primary database
272
+ # Generated: 2025-10-01 10:30:00
273
+ # Adapter: postgresql
274
+
275
+ ActiveRecord::Schema.define do
276
+
277
+ create_table "users", force: :cascade do |t|
278
+ t.string "email", null: false
279
+ t.string "name"
280
+ t.boolean "active", default: true
281
+ t.datetime "created_at", null: false
282
+ t.datetime "updated_at", null: false
283
+ end
284
+
285
+ add_index "users", ["email"], unique: true
286
+
287
+ end
288
+ ```
289
+
290
+ ### Seed File Format
181
291
 
182
292
  Each table gets its own seed file:
183
293
 
@@ -221,33 +331,225 @@ Setting.create!(key: 'app_name', value: 'My App')
221
331
 
222
332
  ### Development
223
333
  ```bash
224
- # Harvest production-like data for development
334
+ # Harvest production-like data for development with schemas
225
335
  rake grainery:generate
226
- rake grainery:load
336
+ rake grainery:load_with_schema
227
337
  ```
228
338
 
229
339
  ### Testing
230
340
  ```bash
231
- # Create test fixtures
341
+ # Create test fixtures with schemas
232
342
  rake grainery:generate
233
- # In test setup, load specific seeds as needed
343
+ # In test setup, load schemas and seeds
344
+ rake grainery:load_with_schema
234
345
  ```
235
346
 
236
347
  ### Staging
237
348
  ```bash
238
- # Harvest production data (anonymized)
349
+ # Harvest production data (anonymized) with schemas
239
350
  rake grainery:generate_all
240
351
  # Deploy to staging
241
- # Load on staging server
242
- rake grainery:load
352
+ # Load on staging server with full schema
353
+ rake grainery:load_with_schema
354
+ ```
355
+
356
+ ### Cross-Database Migration
357
+ ```bash
358
+ # Export from one database system
359
+ rake grainery:generate_all # Captures schema + data
360
+
361
+ # Import to another database system
362
+ rake grainery:load_with_schema # Recreates schema + loads data
243
363
  ```
244
364
 
245
365
  ## Safety Features
246
366
 
247
- 1. **Separate Directories**: Harvested seeds never touch `db/seeds.rb`
248
- 2. **Dependency Order**: Foreign keys respected automatically
249
- 3. **Custom Preservation**: Your `db/seeds.rb` always loads last
250
- 4. **Clean Command**: `rake grainery:clean` removes only harvested files
367
+ 1. **Production Environment Protection**: Destructive tasks (load, load_with_schema, test:db:*) are blocked in production
368
+ - Requires explicit `GRAINERY_ALLOW_PRODUCTION=true` environment variable to override
369
+ - Includes 5-second countdown when override is used
370
+ 2. **Separate Directories**: Harvested seeds never touch `db/seeds.rb`
371
+ 3. **Dependency Order**: Foreign keys respected automatically
372
+ 4. **Custom Preservation**: Your `db/seeds.rb` always loads last
373
+ 5. **Clean Command**: `rake grainery:clean` removes only harvested files
374
+ 6. **Optional Schema Loading**: Schemas only load when explicitly requested
375
+ 7. **Per-Database Schemas**: Each database gets isolated schema file
376
+
377
+ ### Production Safety Matrix
378
+
379
+ **Safe Operations (Read-Only):**
380
+ - ✅ `rake grainery:generate` - Harvests data, no modifications
381
+ - ✅ `rake grainery:generate_all` - Harvests all data, no modifications
382
+ - ✅ `rake grainery:generate_data_only` - Harvests data only, no modifications
383
+ - ✅ `rake grainery:init_config` - Creates config file only
384
+ - ✅ `rake grainery:clean` - Deletes harvested files only (not database data)
385
+
386
+ **Destructive Operations (Blocked by Default):**
387
+ - ❌ `rake grainery:load` - Inserts data into database
388
+ - ❌ `rake grainery:load_with_schema` - Modifies schema AND inserts data
389
+ - ❌ `rake test:db:*` - All test database operations
390
+
391
+ **Recommendation:**
392
+ - Harvesting in production is safe and useful for creating staging/development fixtures
393
+ - Loading in production should be tested thoroughly in staging first due to lack of automated test coverage
394
+ - Always review generated files before loading into any environment
395
+
396
+ ## Data Anonymization
397
+
398
+ ✅ **Built-in Anonymization:** Grainery automatically anonymizes sensitive fields using the Faker gem during harvest.
399
+
400
+ ### Automatic Detection
401
+
402
+ When you run `rake grainery:init_config`, Grainery automatically:
403
+ 1. Scans all database tables and columns
404
+ 2. Detects fields that should be anonymized based on naming patterns
405
+ 3. Adds them to `config/grainery.yml` with appropriate anonymization methods
406
+
407
+ Detected patterns include: `email`, `phone`, `address`, `ssn`, `password`, `token`, Greek documents (`afm`, `amka`, `ada`, `adam`), dates of birth, and more.
408
+
409
+ ### How It Works
410
+
411
+ When harvesting, Grainery automatically replaces sensitive field values with fake data:
412
+
413
+ ```ruby
414
+ # Original production data:
415
+ { email: "john.doe@company.com", name: "John Doe", phone: "555-1234" }
416
+
417
+ # Anonymized in seed files:
418
+ { email: "jane_smith@example.org", name: "Sarah Johnson", phone: "555-987-6543" }
419
+ ```
420
+
421
+ ### Configuration
422
+
423
+ The `config/grainery.yml` file is automatically populated with detected fields during initialization. You can customize it as needed:
424
+
425
+ ```yaml
426
+ anonymize_fields:
427
+ # Global field configuration (applies to all tables)
428
+ email: email # Uses Faker::Internet.email
429
+ first_name: first_name # Uses Faker::Name.first_name
430
+ last_name: last_name # Uses Faker::Name.last_name
431
+ name: name # Uses Faker::Name.name
432
+ phone: phone_number # Uses Faker::PhoneNumber.phone_number
433
+ ssn: ssn # Uses Faker::IDNumber.valid
434
+
435
+ # Table-specific configuration (when same field appears in multiple tables)
436
+ users.address: address # Only anonymize address in users table
437
+ companies.address: skip # Don't anonymize address in companies table
438
+
439
+ # Database.table-specific configuration (most specific)
440
+ primary.users.email: email # Only for users table in primary database
441
+ other.contacts.email: email # Only for contacts table in other database
442
+ ```
443
+
444
+ **Scoping Priority:**
445
+ 1. `database.table.field` (highest priority - most specific)
446
+ 2. `table.field` (medium priority - table-specific)
447
+ 3. `field` (lowest priority - global)
448
+
449
+ When a field name appears in multiple tables, Grainery automatically uses scoped names during detection.
450
+
451
+ ### Disabling Anonymization
452
+
453
+ **Option 1: Disable completely**
454
+
455
+ ```yaml
456
+ # Set to empty hash
457
+ anonymize_fields: {}
458
+ ```
459
+
460
+ **Option 2: Use raw generation task**
461
+
462
+ ```bash
463
+ rake grainery:generate_raw # Harvests without anonymization
464
+ ```
465
+
466
+ **Option 3: Skip specific fields**
467
+
468
+ To keep real values for specific fields while anonymizing others, set them to `skip`:
469
+
470
+ ```yaml
471
+ anonymize_fields:
472
+ email: email # Will be anonymized
473
+ name: name # Will be anonymized
474
+ company_name: skip # Will keep real value (not anonymized)
475
+ department: skip # Will keep real value (not anonymized)
476
+ phone: phone_number # Will be anonymized
477
+ ```
478
+
479
+ This is useful when you need to preserve certain non-sensitive reference data while still protecting personal information.
480
+
481
+ ### Supported Faker Methods
482
+
483
+ **Personal Information:**
484
+ - `email` - Fake email addresses
485
+ - `first_name`, `last_name`, `name` - Fake names
486
+ - `phone_number` - Fake phone numbers
487
+ - `address`, `street_address` - Fake addresses
488
+ - `city`, `state`, `zip_code`, `postal_code` - Fake location data
489
+ - `date_of_birth` - Fake date of birth preserving approximate age (±2 years, minimum age 18 to preserve adulthood)
490
+
491
+ **Financial & Identity:**
492
+ - `ssn` - Fake social security numbers
493
+ - `credit_card_number` - Fake credit card numbers
494
+ - `iban` - Fake Greek IBAN (27 characters: GR + check digits + bank code + account number, auto-truncates to column size)
495
+ - `greek_vat` - Fake Greek VAT number (AFM - 9 digits, adjusts to column size)
496
+ - `greek_amka` - Fake Greek AMKA/Social Security Number (11 digits: DDMMYY + 5 digits, adjusts to column size)
497
+ - `greek_personal_number` - Fake Greek Personal Number (12 characters: 2 digits + letter + 9-digit AFM, e.g., "12A123456789", adjusts to column size)
498
+ - `greek_ada` - Fake Greek ADA/Diavgeia Decision Number (15 characters: 4 Greek letters + 2 digits + 4 Greek letters + dash + 1 digit + 2 Greek letters, e.g., "ΨΜΦΡ69ΟΤΝΡ-9ΤΟ", adjusts to column size)
499
+ - `greek_adam` - Fake Greek ADAM/Public Procurement Publicity identifier (14-15 characters: 2 digits + PROC or REQ + 9 digits, e.g., "24REQ187755230" or "23PROC456789012", adjusts to column size)
500
+ - `identity_number` - Fake identity number (alphanumeric format, adjusts to column size)
501
+
502
+ **Security:**
503
+ - `password` - Fake passwords (auto-truncates to column size)
504
+ - `token` - Random alphanumeric strings (defaults to 32 characters, adjusts to column size)
505
+ - `api_key` - Random alphanumeric strings (defaults to 40 characters, adjusts to column size)
506
+ - `secret` - Random alphanumeric strings (defaults to 64 characters, adjusts to column size)
507
+
508
+ ### Custom Field Mapping
509
+
510
+ Add your own field mappings to anonymize custom columns:
511
+
512
+ ```yaml
513
+ anonymize_fields:
514
+ # Global mappings
515
+ employee_id: ssn
516
+ mobile: phone_number
517
+ home_address: address
518
+ work_email: email
519
+ tax_id: ssn
520
+ bank_account: iban
521
+ tin: greek_vat
522
+ social_insurance: greek_amka
523
+ citizen_id: greek_personal_number
524
+ passport_number: identity_number
525
+ diavgeia_decision: greek_ada
526
+ procurement_number: greek_adam
527
+ birth_date: date_of_birth
528
+
529
+ # Scoped examples for duplicate fields
530
+ users.status: skip # Don't anonymize status in users
531
+ orders.status: skip # Don't anonymize status in orders
532
+ primary.employees.department: skip # Department in primary.employees
533
+ other.staff.department: skip # Department in other.staff
534
+
535
+ # Skip anonymization for non-sensitive fields
536
+ company_name: skip
537
+ department: skip
538
+ job_title: skip
539
+ ```
540
+
541
+ ### Important Notes
542
+
543
+ - Anonymization happens **during harvest**, not during load
544
+ - Generated seed files contain anonymized data
545
+ - Original production data is never modified
546
+ - Safe to commit anonymized seed files to version control
547
+ - Lookup tables are not anonymized (reference data)
548
+ - Anonymization can be disabled per-harvest using `generate_raw` task
549
+ - **Respects database constraints**: Fake values are automatically truncated to match column size limits
550
+ - **Type-aware**: String fields respect their maximum length, numeric fields maintain their data type
551
+ - **Selective anonymization**: Use `skip` to preserve real values for specific fields while anonymizing others
552
+ - **Scoped configuration**: When the same field name appears in multiple tables, use `table.field` or `database.table.field` notation for table-specific or database-specific anonymization
251
553
 
252
554
  ## Best Practices
253
555