grainery 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +64 -0
- data/README.md +321 -19
- data/lib/grainery/grainer.rb +671 -9
- data/lib/grainery/version.rb +1 -1
- data/lib/tasks/grainery_tasks.rake +60 -3
- data/lib/tasks/test_db_tasks.rake +12 -0
- metadata +27 -7
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 24d14592917d3f6680818f8096acd7f68f63a7e47bbb26aadbc19084ee106845
|
|
4
|
+
data.tar.gz: 9eb6671fcd81703bc1a6c450a84ef58213b396c0e3eb731a116dba7d5c22eec6
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: d934e0497f34309c019774b752ba83e83e6f083ab61961ebf7f16235290492c1674a7209fda111c56d9044464f598be7f99ac35b6ba2ef5074dfe95b5b62be1a
|
|
7
|
+
data.tar.gz: 321aaf5825073cf57e747ceda98ae4d4ba1034e180b60a5583474422db80d13fe0507fda4df8b3d71d67b71dc58e25952d22c3abb9e6dba8e7cff8519aa34f21
|
data/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,69 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [0.2.0] - 2025-10-01
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
- Database schema dumping functionality for all related databases
|
|
12
|
+
- Schema files now generated in each database directory (e.g., `db/grainery/primary/schema.rb`)
|
|
13
|
+
- New rake task: `grainery:load_with_schema` - Load schemas and seeds together
|
|
14
|
+
- New rake task: `grainery:generate_data_only` - Harvest data without schema dump
|
|
15
|
+
- New rake task: `grainery:generate_raw` - Harvest data without anonymization
|
|
16
|
+
- Schema loading option in `load_seeds` method with `load_schema` parameter
|
|
17
|
+
- **Production environment protection**: Destructive tasks now blocked in production by default
|
|
18
|
+
- Affects: `grainery:load`, `grainery:load_with_schema`, and all `test:db:*` tasks
|
|
19
|
+
- Override with `GRAINERY_ALLOW_PRODUCTION=true` environment variable
|
|
20
|
+
- Includes 5-second safety countdown when override is used
|
|
21
|
+
- **Data Anonymization**: Automatic anonymization of sensitive fields using Faker gem
|
|
22
|
+
- **Automatic field detection**: `grainery:init_config` scans database schema and auto-detects anonymizable fields
|
|
23
|
+
- **Scoped anonymization**: Support for table-specific and database-specific field configuration using `table.field` or `database.table.field` notation
|
|
24
|
+
- Automatic scoping when duplicate field names are detected across multiple tables
|
|
25
|
+
- Default anonymization for common fields (email, name, phone, address, SSN, credit cards, passwords, tokens, API keys)
|
|
26
|
+
- Greek-specific document anonymization:
|
|
27
|
+
- `greek_vat` - Greek VAT number (AFM - 9 digits)
|
|
28
|
+
- `greek_amka` - Greek Social Security Number (11 digits: DDMMYY + 5 digits)
|
|
29
|
+
- `greek_personal_number` - Greek Personal Number (12 characters: 2 digits + letter + 9-digit AFM)
|
|
30
|
+
- `greek_ada` - Greek ADA/Diavgeia Decision Number (15 characters: 4 Greek letters + 2 digits + 4 Greek letters + dash + 1 digit + 2 Greek letters, e.g., "ΨΜΦΡ69ΟΤΝΡ-9ΤΟ")
|
|
31
|
+
- `greek_adam` - Greek ADAM/Public Procurement Publicity identifier (14-15 characters: 2 digits + PROC or REQ + 9 digits, e.g., "24REQ187755230" or "23PROC456789012")
|
|
32
|
+
- `iban` - Greek IBAN (27 characters)
|
|
33
|
+
- `date_of_birth` - Anonymizes birth dates while preserving approximate age (±2 years, minimum age 18 to preserve adulthood)
|
|
34
|
+
- Selective anonymization: Use `skip` value to preserve real data for specific non-sensitive fields
|
|
35
|
+
- All fake values automatically respect database column size limits
|
|
36
|
+
- Configurable anonymization via `anonymize_fields` in `config/grainery.yml`
|
|
37
|
+
|
|
38
|
+
### Changed
|
|
39
|
+
- `grainery:generate` and `grainery:generate_all` now dump database schemas by default
|
|
40
|
+
- All generation tasks (`generate`, `generate_all`, `generate_data_only`) now anonymize data by default
|
|
41
|
+
- `grainery:load` continues to load only seed data (schemas optional)
|
|
42
|
+
- Schema dump includes table definitions, columns with attributes, and indexes
|
|
43
|
+
- Updated test framework from RSpec to Minitest
|
|
44
|
+
- Rails dependency updated to support versions 6.1 through 8.x
|
|
45
|
+
- Ruby requirement updated to >= 3.2.0
|
|
46
|
+
- Anonymization happens during harvest, not during load
|
|
47
|
+
- Generated seed files contain anonymized data safe for version control
|
|
48
|
+
|
|
49
|
+
### Security
|
|
50
|
+
- Added production environment safeguards to prevent accidental data loss
|
|
51
|
+
- Destructive operations require explicit opt-in via environment variable in production
|
|
52
|
+
- Sensitive data is now anonymized by default during harvest
|
|
53
|
+
- Safe to commit anonymized seed files to version control
|
|
54
|
+
- Lookup tables are not anonymized (reference data)
|
|
55
|
+
|
|
56
|
+
### Technical Details
|
|
57
|
+
- Schema dumps use `ActiveRecord::Schema.define` format
|
|
58
|
+
- Each database connection gets its own schema file
|
|
59
|
+
- Schema loading occurs before seed data when enabled
|
|
60
|
+
- Automatic detection and skip of internal Rails tables (schema_migrations, ar_internal_metadata)
|
|
61
|
+
- Anonymization uses Faker gem for realistic fake data
|
|
62
|
+
- Automatic field detection uses pattern matching on column names (email, phone, ssn, afm, amka, ada, adam, etc.)
|
|
63
|
+
- Detected anonymizable fields are automatically added to `config/grainery.yml` during initialization
|
|
64
|
+
- Scoped field resolution with priority: `database.table.field` > `table.field` > `field`
|
|
65
|
+
- When duplicate field names detected, automatically uses scoped configuration
|
|
66
|
+
- Type-aware anonymization respects column data types and size limits
|
|
67
|
+
- String fields automatically truncated to match column maximum length
|
|
68
|
+
- Numeric fields maintain their data type
|
|
69
|
+
- Date of birth anonymization preserves age categories while protecting actual birth dates
|
|
70
|
+
|
|
8
71
|
## [0.1.0] - 2025-10-01
|
|
9
72
|
|
|
10
73
|
### Added
|
|
@@ -30,4 +93,5 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
30
93
|
- `test:db:clean` - Truncate all test tables
|
|
31
94
|
- `test:db:stats` - Show test database statistics
|
|
32
95
|
|
|
96
|
+
[0.2.0]: https://github.com/mpantel/grainery/releases/tag/v0.2.0
|
|
33
97
|
[0.1.0]: https://github.com/mpantel/grainery/releases/tag/v0.1.0
|
data/README.md
CHANGED
|
@@ -2,17 +2,23 @@
|
|
|
2
2
|
|
|
3
3
|
Database seed storage system for Rails applications. Extract database records and generate seed files organized by database with automatic dependency resolution. Like a grainery stores grain, this gem stores and organizes your database seeds.
|
|
4
4
|
|
|
5
|
+
> **Note:** This gem was developed with assistance from [Claude](https://claude.ai), Anthropic's AI assistant. Claude helped with code generation, documentation, and testing strategies throughout the development process.
|
|
6
|
+
|
|
7
|
+
> **⚠️ Development Status:** This gem is in active development and does not yet have a comprehensive test suite. While the core functionality has been tested manually, automated tests are planned for future releases. Use with caution in production environments.
|
|
8
|
+
|
|
5
9
|
## Features
|
|
6
10
|
|
|
7
11
|
- ✅ Automatic database detection
|
|
8
12
|
- ✅ Dependency-aware loading (topological sort)
|
|
9
13
|
- ✅ Multi-database support
|
|
14
|
+
- ✅ Database schema dumping for all related databases
|
|
10
15
|
- ✅ Configurable per project
|
|
11
16
|
- ✅ Preserves custom seeds
|
|
12
17
|
- ✅ One seed file per table
|
|
13
18
|
- ✅ Clean separation of concerns
|
|
14
19
|
- ✅ Supports SQL Server, MySQL, PostgreSQL
|
|
15
20
|
- ✅ Test database management tasks
|
|
21
|
+
- ✅ Rails 6.1 - 8.x support
|
|
16
22
|
|
|
17
23
|
## Installation
|
|
18
24
|
|
|
@@ -36,27 +42,48 @@ bundle install
|
|
|
36
42
|
rake grainery:init_config
|
|
37
43
|
```
|
|
38
44
|
|
|
39
|
-
This auto-detects
|
|
45
|
+
This auto-detects:
|
|
46
|
+
- All databases and model base classes
|
|
47
|
+
- Anonymizable fields in your database schema (email, phone, SSN, Greek documents, etc.)
|
|
48
|
+
- Creates `config/grainery.yml` with detected configuration
|
|
40
49
|
|
|
41
50
|
### 2. Harvest Data
|
|
42
51
|
|
|
43
52
|
```bash
|
|
44
|
-
# Harvest with limit (100 records per table)
|
|
53
|
+
# Harvest with limit (100 records per table) + schema dump + anonymization
|
|
45
54
|
rake grainery:generate
|
|
46
55
|
|
|
47
|
-
# Harvest ALL records (use with caution)
|
|
56
|
+
# Harvest ALL records + schema dump + anonymization (use with caution)
|
|
48
57
|
rake grainery:generate_all
|
|
58
|
+
|
|
59
|
+
# Harvest data only (no schema dump) + anonymization
|
|
60
|
+
rake grainery:generate_data_only
|
|
61
|
+
|
|
62
|
+
# Harvest without anonymization (raw production data - use with extreme caution!)
|
|
63
|
+
rake grainery:generate_raw
|
|
49
64
|
```
|
|
50
65
|
|
|
66
|
+
**Note:** By default, sensitive fields are anonymized using Faker. Configure anonymization in `config/grainery.yml`.
|
|
67
|
+
|
|
51
68
|
### 3. Load Seeds
|
|
52
69
|
|
|
53
70
|
```bash
|
|
71
|
+
# Load seeds only (blocked in production)
|
|
54
72
|
rake grainery:load
|
|
73
|
+
|
|
74
|
+
# Load schemas + seeds (blocked in production)
|
|
75
|
+
rake grainery:load_with_schema
|
|
76
|
+
|
|
77
|
+
# Override production protection (use with extreme caution!)
|
|
78
|
+
GRAINERY_ALLOW_PRODUCTION=true rake grainery:load
|
|
55
79
|
```
|
|
56
80
|
|
|
81
|
+
**Note:** Loading tasks are blocked in production by default to prevent accidental data loss.
|
|
82
|
+
|
|
57
83
|
This loads:
|
|
58
|
-
1.
|
|
59
|
-
2.
|
|
84
|
+
1. Database schemas (if using `load_with_schema`)
|
|
85
|
+
2. Harvested seeds (in dependency order)
|
|
86
|
+
3. Custom seeds from `db/seeds.rb` (last)
|
|
60
87
|
|
|
61
88
|
## Directory Structure
|
|
62
89
|
|
|
@@ -65,12 +92,15 @@ db/
|
|
|
65
92
|
├── grainery/ # Harvested seeds (auto-generated)
|
|
66
93
|
│ ├── load_order.txt # Load order respecting dependencies
|
|
67
94
|
│ ├── primary/ # Primary database
|
|
95
|
+
│ │ ├── schema.rb # Database schema dump
|
|
68
96
|
│ │ ├── users.rb
|
|
69
97
|
│ │ ├── posts.rb
|
|
70
98
|
│ │ └── comments.rb
|
|
71
99
|
│ ├── other/ # Other database
|
|
100
|
+
│ │ ├── schema.rb # Database schema dump
|
|
72
101
|
│ │ └── projects.rb
|
|
73
102
|
│ └── banking/ # Banking database
|
|
103
|
+
│ ├── schema.rb # Database schema dump
|
|
74
104
|
│ └── employees.rb
|
|
75
105
|
└── seeds.rb # Custom seeds (loaded last)
|
|
76
106
|
```
|
|
@@ -97,6 +127,51 @@ database_connections:
|
|
|
97
127
|
|
|
98
128
|
# Lookup tables (harvest all records)
|
|
99
129
|
lookup_tables: []
|
|
130
|
+
|
|
131
|
+
# Field anonymization (column_name => faker_method)
|
|
132
|
+
# Set to empty hash {} to disable anonymization
|
|
133
|
+
anonymize_fields:
|
|
134
|
+
email: email
|
|
135
|
+
first_name: first_name
|
|
136
|
+
last_name: last_name
|
|
137
|
+
name: name
|
|
138
|
+
phone: phone_number
|
|
139
|
+
phone_number: phone_number
|
|
140
|
+
address: address
|
|
141
|
+
street_address: street_address
|
|
142
|
+
city: city
|
|
143
|
+
state: state
|
|
144
|
+
zip: zip_code
|
|
145
|
+
zip_code: zip_code
|
|
146
|
+
postal_code: zip_code
|
|
147
|
+
ssn: ssn
|
|
148
|
+
credit_card: credit_card_number
|
|
149
|
+
password: password
|
|
150
|
+
token: token
|
|
151
|
+
api_key: api_key
|
|
152
|
+
secret: secret
|
|
153
|
+
iban: iban
|
|
154
|
+
vat_number: greek_vat
|
|
155
|
+
afm: greek_vat
|
|
156
|
+
amka: greek_amka
|
|
157
|
+
social_security_number: greek_amka
|
|
158
|
+
ssn_greek: greek_amka
|
|
159
|
+
personal_number: greek_personal_number
|
|
160
|
+
personal_id: greek_personal_number
|
|
161
|
+
afm_extended: greek_personal_number
|
|
162
|
+
ada: greek_ada
|
|
163
|
+
diavgeia_id: greek_ada
|
|
164
|
+
decision_number: greek_ada
|
|
165
|
+
adam: greek_adam
|
|
166
|
+
adam_number: greek_adam
|
|
167
|
+
procurement_id: greek_adam
|
|
168
|
+
date_of_birth: date_of_birth
|
|
169
|
+
birth_date: date_of_birth
|
|
170
|
+
dob: date_of_birth
|
|
171
|
+
birthdate: date_of_birth
|
|
172
|
+
identity_number: identity_number
|
|
173
|
+
id_number: identity_number
|
|
174
|
+
national_id: identity_number
|
|
100
175
|
```
|
|
101
176
|
|
|
102
177
|
## Available Rake Tasks
|
|
@@ -107,15 +182,24 @@ lookup_tables: []
|
|
|
107
182
|
# Initialize configuration
|
|
108
183
|
rake grainery:init_config
|
|
109
184
|
|
|
110
|
-
# Harvest data (with limit)
|
|
185
|
+
# Harvest data (with limit) + schema dump + anonymization
|
|
111
186
|
rake grainery:generate
|
|
112
187
|
|
|
113
|
-
# Harvest ALL records
|
|
188
|
+
# Harvest ALL records + schema dump + anonymization
|
|
114
189
|
rake grainery:generate_all
|
|
115
190
|
|
|
191
|
+
# Harvest data only (no schema dump) + anonymization
|
|
192
|
+
rake grainery:generate_data_only
|
|
193
|
+
|
|
194
|
+
# Harvest without anonymization (raw production data)
|
|
195
|
+
rake grainery:generate_raw
|
|
196
|
+
|
|
116
197
|
# Load harvested + custom seeds
|
|
117
198
|
rake grainery:load
|
|
118
199
|
|
|
200
|
+
# Load schemas + seeds + custom seeds
|
|
201
|
+
rake grainery:load_with_schema
|
|
202
|
+
|
|
119
203
|
# Clean grainery directory
|
|
120
204
|
rake grainery:clean
|
|
121
205
|
```
|
|
@@ -177,7 +261,33 @@ lookup_tables:
|
|
|
177
261
|
- categories
|
|
178
262
|
```
|
|
179
263
|
|
|
180
|
-
##
|
|
264
|
+
## File Formats
|
|
265
|
+
|
|
266
|
+
### Schema File Format
|
|
267
|
+
|
|
268
|
+
Each database gets a schema dump:
|
|
269
|
+
|
|
270
|
+
```ruby
|
|
271
|
+
# Schema dump for primary database
|
|
272
|
+
# Generated: 2025-10-01 10:30:00
|
|
273
|
+
# Adapter: postgresql
|
|
274
|
+
|
|
275
|
+
ActiveRecord::Schema.define do
|
|
276
|
+
|
|
277
|
+
create_table "users", force: :cascade do |t|
|
|
278
|
+
t.string "email", null: false
|
|
279
|
+
t.string "name"
|
|
280
|
+
t.boolean "active", default: true
|
|
281
|
+
t.datetime "created_at", null: false
|
|
282
|
+
t.datetime "updated_at", null: false
|
|
283
|
+
end
|
|
284
|
+
|
|
285
|
+
add_index "users", ["email"], unique: true
|
|
286
|
+
|
|
287
|
+
end
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
### Seed File Format
|
|
181
291
|
|
|
182
292
|
Each table gets its own seed file:
|
|
183
293
|
|
|
@@ -221,33 +331,225 @@ Setting.create!(key: 'app_name', value: 'My App')
|
|
|
221
331
|
|
|
222
332
|
### Development
|
|
223
333
|
```bash
|
|
224
|
-
# Harvest production-like data for development
|
|
334
|
+
# Harvest production-like data for development with schemas
|
|
225
335
|
rake grainery:generate
|
|
226
|
-
rake grainery:
|
|
336
|
+
rake grainery:load_with_schema
|
|
227
337
|
```
|
|
228
338
|
|
|
229
339
|
### Testing
|
|
230
340
|
```bash
|
|
231
|
-
# Create test fixtures
|
|
341
|
+
# Create test fixtures with schemas
|
|
232
342
|
rake grainery:generate
|
|
233
|
-
# In test setup, load
|
|
343
|
+
# In test setup, load schemas and seeds
|
|
344
|
+
rake grainery:load_with_schema
|
|
234
345
|
```
|
|
235
346
|
|
|
236
347
|
### Staging
|
|
237
348
|
```bash
|
|
238
|
-
# Harvest production data (anonymized)
|
|
349
|
+
# Harvest production data (anonymized) with schemas
|
|
239
350
|
rake grainery:generate_all
|
|
240
351
|
# Deploy to staging
|
|
241
|
-
# Load on staging server
|
|
242
|
-
rake grainery:
|
|
352
|
+
# Load on staging server with full schema
|
|
353
|
+
rake grainery:load_with_schema
|
|
354
|
+
```
|
|
355
|
+
|
|
356
|
+
### Cross-Database Migration
|
|
357
|
+
```bash
|
|
358
|
+
# Export from one database system
|
|
359
|
+
rake grainery:generate_all # Captures schema + data
|
|
360
|
+
|
|
361
|
+
# Import to another database system
|
|
362
|
+
rake grainery:load_with_schema # Recreates schema + loads data
|
|
243
363
|
```
|
|
244
364
|
|
|
245
365
|
## Safety Features
|
|
246
366
|
|
|
247
|
-
1. **
|
|
248
|
-
|
|
249
|
-
|
|
250
|
-
|
|
367
|
+
1. **Production Environment Protection**: Destructive tasks (load, load_with_schema, test:db:*) are blocked in production
|
|
368
|
+
- Requires explicit `GRAINERY_ALLOW_PRODUCTION=true` environment variable to override
|
|
369
|
+
- Includes 5-second countdown when override is used
|
|
370
|
+
2. **Separate Directories**: Harvested seeds never touch `db/seeds.rb`
|
|
371
|
+
3. **Dependency Order**: Foreign keys respected automatically
|
|
372
|
+
4. **Custom Preservation**: Your `db/seeds.rb` always loads last
|
|
373
|
+
5. **Clean Command**: `rake grainery:clean` removes only harvested files
|
|
374
|
+
6. **Optional Schema Loading**: Schemas only load when explicitly requested
|
|
375
|
+
7. **Per-Database Schemas**: Each database gets isolated schema file
|
|
376
|
+
|
|
377
|
+
### Production Safety Matrix
|
|
378
|
+
|
|
379
|
+
**Safe Operations (Read-Only):**
|
|
380
|
+
- ✅ `rake grainery:generate` - Harvests data, no modifications
|
|
381
|
+
- ✅ `rake grainery:generate_all` - Harvests all data, no modifications
|
|
382
|
+
- ✅ `rake grainery:generate_data_only` - Harvests data only, no modifications
|
|
383
|
+
- ✅ `rake grainery:init_config` - Creates config file only
|
|
384
|
+
- ✅ `rake grainery:clean` - Deletes harvested files only (not database data)
|
|
385
|
+
|
|
386
|
+
**Destructive Operations (Blocked by Default):**
|
|
387
|
+
- ❌ `rake grainery:load` - Inserts data into database
|
|
388
|
+
- ❌ `rake grainery:load_with_schema` - Modifies schema AND inserts data
|
|
389
|
+
- ❌ `rake test:db:*` - All test database operations
|
|
390
|
+
|
|
391
|
+
**Recommendation:**
|
|
392
|
+
- Harvesting in production is safe and useful for creating staging/development fixtures
|
|
393
|
+
- Loading in production should be tested thoroughly in staging first due to lack of automated test coverage
|
|
394
|
+
- Always review generated files before loading into any environment
|
|
395
|
+
|
|
396
|
+
## Data Anonymization
|
|
397
|
+
|
|
398
|
+
✅ **Built-in Anonymization:** Grainery automatically anonymizes sensitive fields using the Faker gem during harvest.
|
|
399
|
+
|
|
400
|
+
### Automatic Detection
|
|
401
|
+
|
|
402
|
+
When you run `rake grainery:init_config`, Grainery automatically:
|
|
403
|
+
1. Scans all database tables and columns
|
|
404
|
+
2. Detects fields that should be anonymized based on naming patterns
|
|
405
|
+
3. Adds them to `config/grainery.yml` with appropriate anonymization methods
|
|
406
|
+
|
|
407
|
+
Detected patterns include: `email`, `phone`, `address`, `ssn`, `password`, `token`, Greek documents (`afm`, `amka`, `ada`, `adam`), dates of birth, and more.
|
|
408
|
+
|
|
409
|
+
### How It Works
|
|
410
|
+
|
|
411
|
+
When harvesting, Grainery automatically replaces sensitive field values with fake data:
|
|
412
|
+
|
|
413
|
+
```ruby
|
|
414
|
+
# Original production data:
|
|
415
|
+
{ email: "john.doe@company.com", name: "John Doe", phone: "555-1234" }
|
|
416
|
+
|
|
417
|
+
# Anonymized in seed files:
|
|
418
|
+
{ email: "jane_smith@example.org", name: "Sarah Johnson", phone: "555-987-6543" }
|
|
419
|
+
```
|
|
420
|
+
|
|
421
|
+
### Configuration
|
|
422
|
+
|
|
423
|
+
The `config/grainery.yml` file is automatically populated with detected fields during initialization. You can customize it as needed:
|
|
424
|
+
|
|
425
|
+
```yaml
|
|
426
|
+
anonymize_fields:
|
|
427
|
+
# Global field configuration (applies to all tables)
|
|
428
|
+
email: email # Uses Faker::Internet.email
|
|
429
|
+
first_name: first_name # Uses Faker::Name.first_name
|
|
430
|
+
last_name: last_name # Uses Faker::Name.last_name
|
|
431
|
+
name: name # Uses Faker::Name.name
|
|
432
|
+
phone: phone_number # Uses Faker::PhoneNumber.phone_number
|
|
433
|
+
ssn: ssn # Uses Faker::IDNumber.valid
|
|
434
|
+
|
|
435
|
+
# Table-specific configuration (when same field appears in multiple tables)
|
|
436
|
+
users.address: address # Only anonymize address in users table
|
|
437
|
+
companies.address: skip # Don't anonymize address in companies table
|
|
438
|
+
|
|
439
|
+
# Database.table-specific configuration (most specific)
|
|
440
|
+
primary.users.email: email # Only for users table in primary database
|
|
441
|
+
other.contacts.email: email # Only for contacts table in other database
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
**Scoping Priority:**
|
|
445
|
+
1. `database.table.field` (highest priority - most specific)
|
|
446
|
+
2. `table.field` (medium priority - table-specific)
|
|
447
|
+
3. `field` (lowest priority - global)
|
|
448
|
+
|
|
449
|
+
When a field name appears in multiple tables, Grainery automatically uses scoped names during detection.
|
|
450
|
+
|
|
451
|
+
### Disabling Anonymization
|
|
452
|
+
|
|
453
|
+
**Option 1: Disable completely**
|
|
454
|
+
|
|
455
|
+
```yaml
|
|
456
|
+
# Set to empty hash
|
|
457
|
+
anonymize_fields: {}
|
|
458
|
+
```
|
|
459
|
+
|
|
460
|
+
**Option 2: Use raw generation task**
|
|
461
|
+
|
|
462
|
+
```bash
|
|
463
|
+
rake grainery:generate_raw # Harvests without anonymization
|
|
464
|
+
```
|
|
465
|
+
|
|
466
|
+
**Option 3: Skip specific fields**
|
|
467
|
+
|
|
468
|
+
To keep real values for specific fields while anonymizing others, set them to `skip`:
|
|
469
|
+
|
|
470
|
+
```yaml
|
|
471
|
+
anonymize_fields:
|
|
472
|
+
email: email # Will be anonymized
|
|
473
|
+
name: name # Will be anonymized
|
|
474
|
+
company_name: skip # Will keep real value (not anonymized)
|
|
475
|
+
department: skip # Will keep real value (not anonymized)
|
|
476
|
+
phone: phone_number # Will be anonymized
|
|
477
|
+
```
|
|
478
|
+
|
|
479
|
+
This is useful when you need to preserve certain non-sensitive reference data while still protecting personal information.
|
|
480
|
+
|
|
481
|
+
### Supported Faker Methods
|
|
482
|
+
|
|
483
|
+
**Personal Information:**
|
|
484
|
+
- `email` - Fake email addresses
|
|
485
|
+
- `first_name`, `last_name`, `name` - Fake names
|
|
486
|
+
- `phone_number` - Fake phone numbers
|
|
487
|
+
- `address`, `street_address` - Fake addresses
|
|
488
|
+
- `city`, `state`, `zip_code`, `postal_code` - Fake location data
|
|
489
|
+
- `date_of_birth` - Fake date of birth preserving approximate age (±2 years, minimum age 18 to preserve adulthood)
|
|
490
|
+
|
|
491
|
+
**Financial & Identity:**
|
|
492
|
+
- `ssn` - Fake social security numbers
|
|
493
|
+
- `credit_card_number` - Fake credit card numbers
|
|
494
|
+
- `iban` - Fake Greek IBAN (27 characters: GR + check digits + bank code + account number, auto-truncates to column size)
|
|
495
|
+
- `greek_vat` - Fake Greek VAT number (AFM - 9 digits, adjusts to column size)
|
|
496
|
+
- `greek_amka` - Fake Greek AMKA/Social Security Number (11 digits: DDMMYY + 5 digits, adjusts to column size)
|
|
497
|
+
- `greek_personal_number` - Fake Greek Personal Number (12 characters: 2 digits + letter + 9-digit AFM, e.g., "12A123456789", adjusts to column size)
|
|
498
|
+
- `greek_ada` - Fake Greek ADA/Diavgeia Decision Number (15 characters: 4 Greek letters + 2 digits + 4 Greek letters + dash + 1 digit + 2 Greek letters, e.g., "ΨΜΦΡ69ΟΤΝΡ-9ΤΟ", adjusts to column size)
|
|
499
|
+
- `greek_adam` - Fake Greek ADAM/Public Procurement Publicity identifier (14-15 characters: 2 digits + PROC or REQ + 9 digits, e.g., "24REQ187755230" or "23PROC456789012", adjusts to column size)
|
|
500
|
+
- `identity_number` - Fake identity number (alphanumeric format, adjusts to column size)
|
|
501
|
+
|
|
502
|
+
**Security:**
|
|
503
|
+
- `password` - Fake passwords (auto-truncates to column size)
|
|
504
|
+
- `token` - Random alphanumeric strings (defaults to 32 characters, adjusts to column size)
|
|
505
|
+
- `api_key` - Random alphanumeric strings (defaults to 40 characters, adjusts to column size)
|
|
506
|
+
- `secret` - Random alphanumeric strings (defaults to 64 characters, adjusts to column size)
|
|
507
|
+
|
|
508
|
+
### Custom Field Mapping
|
|
509
|
+
|
|
510
|
+
Add your own field mappings to anonymize custom columns:
|
|
511
|
+
|
|
512
|
+
```yaml
|
|
513
|
+
anonymize_fields:
|
|
514
|
+
# Global mappings
|
|
515
|
+
employee_id: ssn
|
|
516
|
+
mobile: phone_number
|
|
517
|
+
home_address: address
|
|
518
|
+
work_email: email
|
|
519
|
+
tax_id: ssn
|
|
520
|
+
bank_account: iban
|
|
521
|
+
tin: greek_vat
|
|
522
|
+
social_insurance: greek_amka
|
|
523
|
+
citizen_id: greek_personal_number
|
|
524
|
+
passport_number: identity_number
|
|
525
|
+
diavgeia_decision: greek_ada
|
|
526
|
+
procurement_number: greek_adam
|
|
527
|
+
birth_date: date_of_birth
|
|
528
|
+
|
|
529
|
+
# Scoped examples for duplicate fields
|
|
530
|
+
users.status: skip # Don't anonymize status in users
|
|
531
|
+
orders.status: skip # Don't anonymize status in orders
|
|
532
|
+
primary.employees.department: skip # Department in primary.employees
|
|
533
|
+
other.staff.department: skip # Department in other.staff
|
|
534
|
+
|
|
535
|
+
# Skip anonymization for non-sensitive fields
|
|
536
|
+
company_name: skip
|
|
537
|
+
department: skip
|
|
538
|
+
job_title: skip
|
|
539
|
+
```
|
|
540
|
+
|
|
541
|
+
### Important Notes
|
|
542
|
+
|
|
543
|
+
- Anonymization happens **during harvest**, not during load
|
|
544
|
+
- Generated seed files contain anonymized data
|
|
545
|
+
- Original production data is never modified
|
|
546
|
+
- Safe to commit anonymized seed files to version control
|
|
547
|
+
- Lookup tables are not anonymized (reference data)
|
|
548
|
+
- Anonymization can be disabled per-harvest using `generate_raw` task
|
|
549
|
+
- **Respects database constraints**: Fake values are automatically truncated to match column size limits
|
|
550
|
+
- **Type-aware**: String fields respect their maximum length, numeric fields maintain their data type
|
|
551
|
+
- **Selective anonymization**: Use `skip` to preserve real values for specific fields while anonymizing others
|
|
552
|
+
- **Scoped configuration**: When the same field name appears in multiple tables, use `table.field` or `database.table.field` notation for table-specific or database-specific anonymization
|
|
251
553
|
|
|
252
554
|
## Best Practices
|
|
253
555
|
|