pumice 0.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/LICENSE +21 -0
- data/README.md +962 -0
- data/lib/pumice/analyzer.rb +67 -0
- data/lib/pumice/configuration.rb +330 -0
- data/lib/pumice/dsl.rb +267 -0
- data/lib/pumice/dump_generator.rb +115 -0
- data/lib/pumice/empty_sanitizer.rb +38 -0
- data/lib/pumice/generators/column_classification.rb +58 -0
- data/lib/pumice/generators/install_generator.rb +33 -0
- data/lib/pumice/generators/sanitizer_generator.rb +107 -0
- data/lib/pumice/generators/templates/initializer.rb.erb +51 -0
- data/lib/pumice/generators/templates/sanitizer.rb.erb +32 -0
- data/lib/pumice/generators/templates/sanitizer_spec.rb.erb +15 -0
- data/lib/pumice/generators/test_generator.rb +20 -0
- data/lib/pumice/helpers.rb +141 -0
- data/lib/pumice/logger.rb +105 -0
- data/lib/pumice/output.rb +81 -0
- data/lib/pumice/progress.rb +42 -0
- data/lib/pumice/pruner.rb +157 -0
- data/lib/pumice/pruning/analyzer.rb +207 -0
- data/lib/pumice/railtie.rb +15 -0
- data/lib/pumice/rspec.rb +101 -0
- data/lib/pumice/runner.rb +66 -0
- data/lib/pumice/safe_scrubber.rb +341 -0
- data/lib/pumice/sanitizer.rb +336 -0
- data/lib/pumice/soft_scrubbing/policy.rb +104 -0
- data/lib/pumice/soft_scrubbing.rb +101 -0
- data/lib/pumice/validator.rb +113 -0
- data/lib/pumice/version.rb +5 -0
- data/lib/pumice.rb +23 -0
- data/lib/tasks/db_scrub.rake +616 -0
- metadata +132 -0
data/README.md
ADDED
|
@@ -0,0 +1,962 @@
|
|
|
1
|
+
# Pumice
|
|
2
|
+
|
|
3
|
+
Database PII sanitization for Rails. Declarative scrubbing, pruning, and safe export of PII-free database copies. All operations are **non-destructive** to the source database unless you explicitly opt into destructive mode.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Table of Contents
|
|
8
|
+
|
|
9
|
+
- [Quick Start](#quick-start)
|
|
10
|
+
- [Sanitizer DSL](#sanitizer-dsl)
|
|
11
|
+
- [Verification](#verification)
|
|
12
|
+
- [Helpers](#helpers)
|
|
13
|
+
- [Rake Tasks](#rake-tasks)
|
|
14
|
+
- [Configuration](#configuration)
|
|
15
|
+
- [Safe Scrub](#safe-scrub)
|
|
16
|
+
- [Pruning](#pruning)
|
|
17
|
+
- [Soft Scrubbing](#soft-scrubbing)
|
|
18
|
+
- [Testing](#testing)
|
|
19
|
+
- [Materialized Views](#materialized-views)
|
|
20
|
+
- [Gotchas](#gotchas)
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Quick Start
|
|
25
|
+
|
|
26
|
+
### 1. Install
|
|
27
|
+
|
|
28
|
+
```ruby
|
|
29
|
+
# Gemfile
|
|
30
|
+
gem 'pumice'
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
bundle install
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
### 2. Create the initializer
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
rails generate pumice:install
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
This creates [config/initializers/pumice.rb](config/initializers/pumice.rb) with commented defaults. The defaults work out of the box — customize later as needed.
|
|
44
|
+
|
|
45
|
+
### 3. Generate a sanitizer (and test)
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
rails generate pumice:sanitizer User # sanitizer + test (if applicable)
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
This inspects your model's columns and generates `app/sanitizers/user_sanitizer.rb` — PII columns get `scrub` stubs, credentials get flagged, and safe columns get `keep` declarations. Every `scrub` block raises `NotImplementedError` until you define the logic.
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
rails generate pumice:sanitizer User # stubs (you define scrub logic)
|
|
55
|
+
rails generate pumice:sanitizer User --defaults # pre-filled with Faker defaults
|
|
56
|
+
rails generate pumice:sanitizer User --no-test # skip test generation
|
|
57
|
+
rails generate pumice:test User # test only (backfill existing sanitizers)
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
If your project uses RSpec (detected by the presence of `spec/`), a spec is generated with `have_scrubbed` and `have_kept` matchers. See [Testing](#testing) for the full RSpec integration.
|
|
61
|
+
|
|
62
|
+
### 4. Review and adjust the generated sanitizer
|
|
63
|
+
|
|
64
|
+
Without `--defaults`, scrub blocks require you to define the logic:
|
|
65
|
+
|
|
66
|
+
```ruby
|
|
67
|
+
# app/sanitizers/user_sanitizer.rb
|
|
68
|
+
class UserSanitizer < Pumice::Sanitizer
|
|
69
|
+
# PII - scrub with fake data
|
|
70
|
+
scrub(:email) { raise NotImplementedError }
|
|
71
|
+
scrub(:first_name) { raise NotImplementedError }
|
|
72
|
+
scrub(:last_name) { raise NotImplementedError }
|
|
73
|
+
|
|
74
|
+
# Credentials - clear sensitive data
|
|
75
|
+
scrub(:encrypted_password) { raise NotImplementedError }
|
|
76
|
+
|
|
77
|
+
# Non-PII - safe to keep
|
|
78
|
+
keep :roles, :active
|
|
79
|
+
end
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
With `--defaults`, blocks are pre-filled with smart Faker logic:
|
|
83
|
+
|
|
84
|
+
```ruby
|
|
85
|
+
# app/sanitizers/user_sanitizer.rb (--defaults)
|
|
86
|
+
class UserSanitizer < Pumice::Sanitizer
|
|
87
|
+
scrub(:email) { fake_email(record) }
|
|
88
|
+
scrub(:first_name) { Faker::Name.first_name }
|
|
89
|
+
scrub(:last_name) { Faker::Name.last_name }
|
|
90
|
+
scrub(:encrypted_password) { nil }
|
|
91
|
+
|
|
92
|
+
keep :roles, :active
|
|
93
|
+
end
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
| Column name contains | Pre-filled scrubbing definition |
|
|
97
|
+
|---|---|
|
|
98
|
+
| `email` | `fake_email(record)` (nil-safe when nullable) |
|
|
99
|
+
| `phone`, `call_number` | `fake_phone` (nil-safe when nullable) |
|
|
100
|
+
| `first_name` | `Faker::Name.first_name` |
|
|
101
|
+
| `last_name` | `Faker::Name.last_name` |
|
|
102
|
+
| `name`, `display_name`, `full_name` | `Faker::Name.name` |
|
|
103
|
+
| `address`, `street` | `Faker::Address.street_address` |
|
|
104
|
+
| `city` | `Faker::Address.city` |
|
|
105
|
+
| `state` | `Faker::Address.state_abbr` |
|
|
106
|
+
| `zip` | `Faker::Address.zip` |
|
|
107
|
+
| `username`, `login` | `"user_#{record.id}"` |
|
|
108
|
+
| `bio`, `description`, `notes` | `match_length(value, use: :paragraph)` |
|
|
109
|
+
| other `text` columns | `match_length(value, use: :paragraph)` |
|
|
110
|
+
| other `string` columns | `Faker::Lorem.word` |
|
|
111
|
+
| **Credentials** (`password`, `token`, `secret`, `key`, `encrypted`, `oauth`, etc.) | `nil` |
|
|
112
|
+
|
|
113
|
+
### 5. Run it
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
# Preview what would change (no writes)
|
|
117
|
+
rake db:scrub:test
|
|
118
|
+
|
|
119
|
+
# Generate a scrubbed database dump (source untouched)
|
|
120
|
+
rake db:scrub:generate
|
|
121
|
+
|
|
122
|
+
# Or copy-and-scrub to a separate database
|
|
123
|
+
SOURCE_DATABASE_URL=postgres://prod/myapp \
|
|
124
|
+
TARGET_DATABASE_URL=postgres://local/myapp_dev \
|
|
125
|
+
rake db:scrub:safe
|
|
126
|
+
|
|
127
|
+
# Or destructively scrub the attached database (WARNING!)
|
|
128
|
+
rake db:scrub:all
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
That's it. Pumice auto-discovers sanitizers in `app/sanitizers/` and auto-registers them by class name (`UserSanitizer` → `users`).
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## Sanitizer DSL
|
|
136
|
+
|
|
137
|
+
Each sanitizer handles one ActiveRecord model. Place them in `app/sanitizers/`.
|
|
138
|
+
|
|
139
|
+
### `scrub(column, &block)`
|
|
140
|
+
|
|
141
|
+
Define how to replace a PII column. The block receives the original value and has access to `record` (the ActiveRecord instance) and all [helpers](#helpers).
|
|
142
|
+
|
|
143
|
+
```ruby
|
|
144
|
+
scrub(:first_name) { Faker::Name.first_name }
|
|
145
|
+
scrub(:bio) { |value| match_length(value, use: :paragraph) }
|
|
146
|
+
scrub(:notes) { |value| value.present? ? Faker::Lorem.sentence : nil }
|
|
147
|
+
scrub(:email) { fake_email(record, domain: 'test.example') }
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### `keep(*columns)`
|
|
151
|
+
|
|
152
|
+
Mark columns as non-PII. No changes applied. *Note: `id`, `created_at`, and `updated_at` are kept automatically — you never need to declare them.*
|
|
153
|
+
|
|
154
|
+
```ruby
|
|
155
|
+
keep :role, :status
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### `keep_undefined_columns!`
|
|
159
|
+
|
|
160
|
+
Keeps all columns not explicitly defined via `scrub` or `keep`. **Bypasses PII review.** Use only during initial development. Disable globally with:
|
|
161
|
+
|
|
162
|
+
```ruby
|
|
163
|
+
Pumice.configure { |c| c.allow_keep_undefined_columns = false }
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
### Referencing other attributes in scrub blocks
|
|
167
|
+
|
|
168
|
+
**Bare names** return scrubbed values. **`raw(:attribute_name)`** returns original database values.
|
|
169
|
+
|
|
170
|
+
```ruby
|
|
171
|
+
class UserSanitizer < Pumice::Sanitizer
|
|
172
|
+
scrub(:first_name) { Faker::Name.first_name }
|
|
173
|
+
scrub(:last_name) { Faker::Name.last_name }
|
|
174
|
+
scrub(:display_name) { "#{first_name} #{last_name}" } # scrubbed values
|
|
175
|
+
scrub(:email) { "#{raw(:first_name)}.#{raw(:last_name)}@example.test".downcase } # original values
|
|
176
|
+
|
|
177
|
+
# ...
|
|
178
|
+
end
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### Model binding
|
|
182
|
+
|
|
183
|
+
Inferred from class name by default — `UserSanitizer` automatically binds to `User`, so `sanitizes` is optional when the naming convention matches. Use it when the class name doesn't map directly to the model:
|
|
184
|
+
|
|
185
|
+
```ruby
|
|
186
|
+
class LegacyUserDataSanitizer < Pumice::Sanitizer
|
|
187
|
+
sanitizes :users # binds to User
|
|
188
|
+
end
|
|
189
|
+
|
|
190
|
+
class AdminUserSanitizer < Pumice::Sanitizer
|
|
191
|
+
sanitizes :admin_users, class_name: 'Admin::User' # namespaced model
|
|
192
|
+
end
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
### Friendly names
|
|
196
|
+
|
|
197
|
+
Controls the name used in rake tasks. Default: class name underscored and pluralized.
|
|
198
|
+
|
|
199
|
+
```ruby
|
|
200
|
+
class TutorSessionFeedbackSanitizer < Pumice::Sanitizer
|
|
201
|
+
friendly_name 'feedback' # rake 'db:scrub:only[feedback]'
|
|
202
|
+
end
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
| Class Name | Default | Custom |
|
|
206
|
+
|---|---|---|
|
|
207
|
+
| `UserSanitizer` | `users` | - |
|
|
208
|
+
| `TutorSessionFeedbackSanitizer` | `tutor_session_feedbacks` | `feedback` |
|
|
209
|
+
|
|
210
|
+
### `prune` (pre-step, not terminal)
|
|
211
|
+
|
|
212
|
+
Removes matching records **before** record-by-record scrubbing. Survivors get scrubbed. Use when you have records worth keeping but need to reduce the dataset first.
|
|
213
|
+
|
|
214
|
+
```ruby
|
|
215
|
+
class EmailLogSanitizer < Pumice::Sanitizer
|
|
216
|
+
prune { where(created_at: ..1.year.ago) } # delete old logs
|
|
217
|
+
|
|
218
|
+
scrub(:email) { fake_email(record) } # scrub the rest
|
|
219
|
+
scrub(:body) { |value| match_length(value, use: :paragraph) }
|
|
220
|
+
|
|
221
|
+
# ...
|
|
222
|
+
end
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
Convenience shorthands:
|
|
226
|
+
|
|
227
|
+
```ruby
|
|
228
|
+
prune_older_than 1.year
|
|
229
|
+
prune_older_than 90.days, column: :updated_at
|
|
230
|
+
prune_older_than "2024-01-01"
|
|
231
|
+
prune_newer_than 30.days
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
### Bulk operations (terminal)
|
|
235
|
+
|
|
236
|
+
For tables where you want records **gone**, not scrubbed. The entire sanitizer is just the deletion — no `scrub`/`keep` declarations needed, and no scrubbing runs after. Use `destroy_all` over `delete_all` when you need ActiveRecord callbacks (e.g., `dependent: :destroy` associations).
|
|
237
|
+
|
|
238
|
+
```ruby
|
|
239
|
+
# Wipe entire table (fastest, resets auto-increment)
|
|
240
|
+
class SessionSanitizer < Pumice::Sanitizer
|
|
241
|
+
truncate!
|
|
242
|
+
end
|
|
243
|
+
|
|
244
|
+
# SQL DELETE with optional scope (no callbacks)
|
|
245
|
+
class VersionSanitizer < Pumice::Sanitizer
|
|
246
|
+
sanitizes :versions, class_name: 'PaperTrail::Version'
|
|
247
|
+
|
|
248
|
+
delete_all { where(item_type: %w[User Message]) }
|
|
249
|
+
end
|
|
250
|
+
|
|
251
|
+
# ActiveRecord destroy with callbacks and dependent associations
|
|
252
|
+
class AttachmentSanitizer < Pumice::Sanitizer
|
|
253
|
+
destroy_all { where(attachable_id: nil) }
|
|
254
|
+
end
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
### When to use what
|
|
258
|
+
|
|
259
|
+
The key distinction: `prune` is a pre-step that scrubs survivors, while bulk operations are terminal — deletion is the entire sanitizer.
|
|
260
|
+
|
|
261
|
+
| Goal | DSL | Scrubs survivors? |
|
|
262
|
+
|---|---|:---:|
|
|
263
|
+
| Delete old records, scrub the rest | `prune` / `prune_[older\|newer]_than` | Yes |
|
|
264
|
+
| Wipe entire table | `truncate!` | No |
|
|
265
|
+
| Delete matching records (fast, no callbacks) | `delete_all { scope }` | No |
|
|
266
|
+
| Delete with callbacks/associations | `destroy_all { scope }` | No |
|
|
267
|
+
|
|
268
|
+
### Programmatic usage
|
|
269
|
+
|
|
270
|
+
```ruby
|
|
271
|
+
UserSanitizer.sanitize(user) # returns hash, does not persist
|
|
272
|
+
UserSanitizer.sanitize(user, :email) # returns single scrubbed value
|
|
273
|
+
UserSanitizer.scrub!(user) # persists all scrubbed values
|
|
274
|
+
UserSanitizer.scrub!(user, :email) # persists single scrubbed value
|
|
275
|
+
UserSanitizer.scrub_all! # batch: prune → scrub → verify
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
---
|
|
279
|
+
|
|
280
|
+
## Verification
|
|
281
|
+
|
|
282
|
+
Post-operation checks declared inside a sanitizer definition. All verification raises `Pumice::VerificationError` on failure and is skipped during dry runs.
|
|
283
|
+
|
|
284
|
+
### Table-level
|
|
285
|
+
|
|
286
|
+
```ruby
|
|
287
|
+
class UserSanitizer < Pumice::Sanitizer
|
|
288
|
+
scrub(:email) { Faker::Internet.email }
|
|
289
|
+
|
|
290
|
+
verify_all "No real emails should remain" do
|
|
291
|
+
where("email LIKE '%@gmail.com'").none?
|
|
292
|
+
end
|
|
293
|
+
end
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
The `verify_all` block runs in model scope (`User.instance_exec`). Return truthy for success.
|
|
297
|
+
|
|
298
|
+
### Per-record
|
|
299
|
+
|
|
300
|
+
```ruby
|
|
301
|
+
class UserSanitizer < Pumice::Sanitizer
|
|
302
|
+
scrub(:email) { Faker::Internet.email }
|
|
303
|
+
|
|
304
|
+
verify_each "Email should be scrubbed" do |record|
|
|
305
|
+
!record.email.match?(/gmail|yahoo|hotmail/)
|
|
306
|
+
end
|
|
307
|
+
end
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
### Inline (bulk operations)
|
|
311
|
+
|
|
312
|
+
Bulk operations accept a `verify: true` option that uses a default check after execution:
|
|
313
|
+
|
|
314
|
+
```ruby
|
|
315
|
+
class AuditLogSanitizer < Pumice::Sanitizer
|
|
316
|
+
truncate!(verify: true) # verifies count.zero?
|
|
317
|
+
end
|
|
318
|
+
|
|
319
|
+
class VersionSanitizer < Pumice::Sanitizer
|
|
320
|
+
delete_all(verify: true) { where(item_type: 'User') } # verifies scope.none?
|
|
321
|
+
end
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
### Default verification for bulk operations
|
|
325
|
+
|
|
326
|
+
| Operation | Default check |
|
|
327
|
+
|---|---|
|
|
328
|
+
| `truncate!` | `count.zero?` |
|
|
329
|
+
| `delete_all` (no scope) | `count.zero?` |
|
|
330
|
+
| `delete_all { scope }` | `scope.none?` |
|
|
331
|
+
| `destroy_all` (no scope) | `count.zero?` |
|
|
332
|
+
| `destroy_all { scope }` | `scope.none?` |
|
|
333
|
+
|
|
334
|
+
Call `verify_all` without a block on a bulk sanitizer to use the default. Calling `verify_all` without a block on a non-bulk sanitizer raises `ArgumentError`.
|
|
335
|
+
|
|
336
|
+
### Custom verification policy
|
|
337
|
+
|
|
338
|
+
```ruby
|
|
339
|
+
Pumice.configure do |config|
|
|
340
|
+
config.default_verification = ->(_model_class, operation) {
|
|
341
|
+
case operation[:type]
|
|
342
|
+
when :truncate
|
|
343
|
+
-> { count.zero? }
|
|
344
|
+
when :delete, :destroy
|
|
345
|
+
operation[:scope] || -> { count.zero? }
|
|
346
|
+
end
|
|
347
|
+
}
|
|
348
|
+
end
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
---
|
|
352
|
+
|
|
353
|
+
## Helpers
|
|
354
|
+
|
|
355
|
+
All helpers are available inside `scrub` blocks via `Pumice::Helpers`.
|
|
356
|
+
|
|
357
|
+
### Quick reference
|
|
358
|
+
|
|
359
|
+
| Helper | Output | Example |
|
|
360
|
+
|---|---|---|
|
|
361
|
+
| `fake_email(record)` | `user_123@example.test` | Deterministic per record |
|
|
362
|
+
| `fake_phone(digits = 10)` | `5551234567` | Random digits |
|
|
363
|
+
| `fake_password(pwd = 'password123', cost: 4)` | `$2a$04$...` | BCrypt hash |
|
|
364
|
+
| `fake_id(id, prefix: 'ID')` | `ID000123` | Zero-padded |
|
|
365
|
+
| `match_length(value, use: :sentence)` | `Lorem ipsum...` | Matches original length |
|
|
366
|
+
| `fake_json(value, preserve_keys: true, keep: [])` | `{"name": "lorem"}` | Structure-preserving |
|
|
367
|
+
|
|
368
|
+
### `fake_email`
|
|
369
|
+
|
|
370
|
+
Deterministic — same record always produces the same email across runs. Important for data consistency.
|
|
371
|
+
|
|
372
|
+
```ruby
|
|
373
|
+
class UserSanitizer < Pumice::Sanitizer
|
|
374
|
+
sanitizes :users
|
|
375
|
+
|
|
376
|
+
scrub(:email) { fake_email(record) } # user_123@example.test
|
|
377
|
+
scrub(:email) { fake_email(record, domain: 'test.example.com') } # user_123@test.example.com
|
|
378
|
+
scrub(:contact_email) {
|
|
379
|
+
fake_email(prefix: 'contact', unique_id: record.unique_id) # contact_789@example.test
|
|
380
|
+
}
|
|
381
|
+
end
|
|
382
|
+
```
|
|
383
|
+
|
|
384
|
+
### `fake_password`
|
|
385
|
+
|
|
386
|
+
Uses low BCrypt cost (4) for speed. All scrubbed users get the same password so devs can log in.
|
|
387
|
+
|
|
388
|
+
```ruby
|
|
389
|
+
scrub(:encrypted_password) { fake_password } # hash of default 'password123'
|
|
390
|
+
scrub(:encrypted_password) { fake_password('testpass') } # custom password
|
|
391
|
+
```
|
|
392
|
+
|
|
393
|
+
### `match_length`
|
|
394
|
+
|
|
395
|
+
Generates text approximating the original value's length. Respects column constraints.
|
|
396
|
+
|
|
397
|
+
```ruby
|
|
398
|
+
scrub(:bio) { |value| match_length(value, use: :paragraph) }
|
|
399
|
+
scrub(:code) { |value| match_length(value, use: :characters) } # random alphanumeric
|
|
400
|
+
scrub(:title) { |value| match_length(value, use: -> { Faker::Book.title }) } # custom generator
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
| Generator | Best for |
|
|
404
|
+
|---|---|
|
|
405
|
+
| `:sentence` | Bios, comments (default) |
|
|
406
|
+
| `:paragraph` | Long-form content |
|
|
407
|
+
| `:word` | Short fields, names |
|
|
408
|
+
| `:characters` | Codes, tokens |
|
|
409
|
+
| `-> { ... }` | Any custom Faker or logic |
|
|
410
|
+
|
|
411
|
+
### `fake_json`
|
|
412
|
+
|
|
413
|
+
Sanitizes JSON structures. Strings become random words, numbers become `0`, booleans and `nil` are preserved. Structure (nesting depth, array lengths) is always retained.
|
|
414
|
+
|
|
415
|
+
```ruby
|
|
416
|
+
scrub(:preferences) { |value| fake_json(value) } # fake values, keep keys
|
|
417
|
+
scrub(:metadata) { |value| fake_json(value, preserve_keys: false) } # fake keys AND values
|
|
418
|
+
scrub(:config) { |value| fake_json(value, keep: ['api_version']) } # preserve specific key/value pairs
|
|
419
|
+
scrub(:data) { |value| fake_json(value, keep: ['user.profile.email']) } # dot notation for nesting
|
|
420
|
+
```
|
|
421
|
+
|
|
422
|
+
| Option | Keys | Values |
|
|
423
|
+
|---|---|---|
|
|
424
|
+
| `fake_json(value)` | Original | Faked |
|
|
425
|
+
| `fake_json(value, preserve_keys: false)` | Faked | Faked |
|
|
426
|
+
| `fake_json(value, keep: ['path'])` | Original (kept paths preserved) | Faked (kept paths preserved) |
|
|
427
|
+
| `fake_json(value, preserve_keys: false, keep: ['path'])` | Faked (kept paths preserved) | Faked (kept paths preserved) |
|
|
428
|
+
|
|
429
|
+
### Custom helpers
|
|
430
|
+
|
|
431
|
+
Extend `Pumice::Helpers` for project-specific needs:
|
|
432
|
+
|
|
433
|
+
```ruby
|
|
434
|
+
# config/initializers/pumice_helpers.rb
|
|
435
|
+
module Pumice
|
|
436
|
+
module Helpers
|
|
437
|
+
def fake_student_id(record)
|
|
438
|
+
"STU-#{record.student_id}"
|
|
439
|
+
end
|
|
440
|
+
|
|
441
|
+
def redact(value, show_last: 4)
|
|
442
|
+
return nil if value.blank?
|
|
443
|
+
"******"
|
|
444
|
+
end
|
|
445
|
+
end
|
|
446
|
+
end
|
|
447
|
+
```
|
|
448
|
+
|
|
449
|
+
---
|
|
450
|
+
|
|
451
|
+
## Rake Tasks
|
|
452
|
+
|
|
453
|
+
### Inspection
|
|
454
|
+
|
|
455
|
+
```bash
|
|
456
|
+
rake db:scrub:list # list registered sanitizers and their friendly names
|
|
457
|
+
rake db:scrub:lint # check all columns are defined (scrub or keep), exits 1 on issues
|
|
458
|
+
rake db:scrub:validate # check scrubbed DB for PII leaks (real emails, uncleared tokens)
|
|
459
|
+
rake db:scrub:analyze # show top 20 tables by size, row counts for sensitive tables
|
|
460
|
+
```
|
|
461
|
+
|
|
462
|
+
### Safe operations (source never modified)
|
|
463
|
+
|
|
464
|
+
```bash
|
|
465
|
+
rake db:scrub:test # dry run all sanitizers
|
|
466
|
+
rake 'db:scrub:test[users,messages]' # dry run specific sanitizers
|
|
467
|
+
rake db:scrub:generate # create temp DB, scrub, export dump, cleanup
|
|
468
|
+
rake db:scrub:safe # copy to target DB, scrub target (interactive)
|
|
469
|
+
rake 'db:scrub:safe_confirmed[mydb]' # same, but auto-confirmed for CI
|
|
470
|
+
```
|
|
471
|
+
|
|
472
|
+
### ⚠️ Destructive operations (modifies current database) ⚠️
|
|
473
|
+
|
|
474
|
+
The following will modify the currently attached database. You will be prompt to confirm, but user be warned:
|
|
475
|
+
|
|
476
|
+
```bash
|
|
477
|
+
rake db:scrub:all # scrub current DB in-place (interactive confirmation)
|
|
478
|
+
rake 'db:scrub:only[users,messages]' # scrub specific tables in-place
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
### Progress indicators
|
|
482
|
+
|
|
483
|
+
Long-running operations display progress bars when output is a TTY:
|
|
484
|
+
|
|
485
|
+
```
|
|
486
|
+
Sanitizers: |============================ | 5/7 ETA: 00:12
|
|
487
|
+
Users: |================================== | 980/1024 ETA: 00:02
|
|
488
|
+
```
|
|
489
|
+
|
|
490
|
+
Progress bars are automatically hidden when:
|
|
491
|
+
- `VERBOSE=true` (verbose mode shows per-record detail instead)
|
|
492
|
+
- Output is piped or redirected (non-TTY)
|
|
493
|
+
- The collection is empty
|
|
494
|
+
|
|
495
|
+
Safe Scrub operations show a numbered step counter:
|
|
496
|
+
|
|
497
|
+
```
|
|
498
|
+
[1/5] Creating fresh target database...
|
|
499
|
+
[2/5] Copying data from source to target...
|
|
500
|
+
```
|
|
501
|
+
|
|
502
|
+
### Environment variables
|
|
503
|
+
|
|
504
|
+
| Variable | Effect |
|
|
505
|
+
|---|---|
|
|
506
|
+
| `DRY_RUN=true` | Log changes without persisting |
|
|
507
|
+
| `VERBOSE=true` | Detailed per-record output (disables progress bars) |
|
|
508
|
+
| `PRUNE=false` | Disable pruning without changing config |
|
|
509
|
+
| `SOURCE_DATABASE_URL` | Source DB for safe scrub |
|
|
510
|
+
| `TARGET_DATABASE_URL` | Target DB for safe scrub |
|
|
511
|
+
| `SCRUBBED_DATABASE_URL` | Alternative to `TARGET_DATABASE_URL` |
|
|
512
|
+
| `EXPORT_PATH` | Path to export scrubbed dump |
|
|
513
|
+
| `EXCLUDE_INDEXES=true` | Exclude indexes/triggers/constraints from dump |
|
|
514
|
+
| `EXCLUDE_MATVIEWS=false` | Include materialized views in dump (excluded by default) |
|
|
515
|
+
|
|
516
|
+
---
|
|
517
|
+
|
|
518
|
+
## Configuration
|
|
519
|
+
|
|
520
|
+
Create an initializer. All settings have sensible defaults — only override what you need.
|
|
521
|
+
|
|
522
|
+
```ruby
|
|
523
|
+
# config/initializers/pumice.rb
|
|
524
|
+
Pumice.configure do |config|
|
|
525
|
+
# Column coverage enforcement (default: true)
|
|
526
|
+
# Raises if a sanitizer doesn't define every column as scrub or keep
|
|
527
|
+
config.strict = true
|
|
528
|
+
|
|
529
|
+
# Tables to report row counts for in db:scrub:analyze (default: [])
|
|
530
|
+
config.sensitive_tables = %w[users messages student_profiles]
|
|
531
|
+
|
|
532
|
+
# Email domains that indicate real PII — validation fails if found (default: [])
|
|
533
|
+
config.sensitive_email_domains = %w[gmail.com yahoo.com hotmail.com]
|
|
534
|
+
end
|
|
535
|
+
```
|
|
536
|
+
|
|
537
|
+
### Full options reference
|
|
538
|
+
|
|
539
|
+
| Option | Default | Description |
|
|
540
|
+
|---|---|---|
|
|
541
|
+
| `verbose` | `false` | Increase console output detail |
|
|
542
|
+
| `strict` | `true` | Raise if sanitizer columns are undefined |
|
|
543
|
+
| `continue_on_error` | `false` | Continue on sanitizer failure vs halt |
|
|
544
|
+
| `allow_keep_undefined_columns` | `true` | Allow `keep_undefined_columns!` DSL |
|
|
545
|
+
| `sensitive_tables` | `[]` | Tables to analyze for row counts |
|
|
546
|
+
| `sensitive_email_domains` | `[]` | Domains indicating real PII |
|
|
547
|
+
| `sensitive_email_model` | `'User'` | Model to query for email validation |
|
|
548
|
+
| `sensitive_email_column` | `'email'` | Column for email lookup |
|
|
549
|
+
| `sensitive_token_columns` | `%w[reset_password_token confirmation_token]` | Token columns to verify are cleared |
|
|
550
|
+
| `sensitive_external_id_columns` | `[]` | External ID columns to verify are cleared |
|
|
551
|
+
| `source_database_url` | `nil` | Source DB for safe scrub (`:auto` to derive from Rails config) |
|
|
552
|
+
| `target_database_url` | `nil` | Target DB for safe scrub |
|
|
553
|
+
| `export_path` | `nil` | Path to export scrubbed dump |
|
|
554
|
+
| `export_format` | `:custom` | `:custom` (pg_dump -Fc) or `:plain` (SQL) |
|
|
555
|
+
| `require_readonly_source` | `false` | Enforce read-only source (error vs warn) |
|
|
556
|
+
| `soft_scrubbing` | `false` | Runtime PII masking — set to hash to enable |
|
|
557
|
+
| `pruning` | `false` | Pre-sanitization record pruning — set to hash to enable |
|
|
558
|
+
|
|
559
|
+
---
|
|
560
|
+
|
|
561
|
+
## Safe Scrub
|
|
562
|
+
|
|
563
|
+
Safe Scrub creates a sanitized copy of your database without modifying the source. This is the recommended workflow for production environments.
|
|
564
|
+
|
|
565
|
+
### Flow
|
|
566
|
+
|
|
567
|
+
```
|
|
568
|
+
rake db:scrub:generate
|
|
569
|
+
├─ Create temp database
|
|
570
|
+
├─ Copy source → temp
|
|
571
|
+
├─ Run global pruning (if configured)
|
|
572
|
+
├─ Run all sanitizers
|
|
573
|
+
├─ Export dump file
|
|
574
|
+
└─ Drop temp database
|
|
575
|
+
|
|
576
|
+
rake db:scrub:safe
|
|
577
|
+
├─ Validate source ≠ target
|
|
578
|
+
├─ Confirm target DB name (interactive or argument)
|
|
579
|
+
├─ Drop and recreate target
|
|
580
|
+
├─ Copy source → target
|
|
581
|
+
├─ Run global pruning
|
|
582
|
+
├─ Run sanitizers
|
|
583
|
+
├─ Verify
|
|
584
|
+
└─ Export (if configured)
|
|
585
|
+
```
|
|
586
|
+
|
|
587
|
+
### Configuration
|
|
588
|
+
|
|
589
|
+
```ruby
|
|
590
|
+
Pumice.configure do |config|
|
|
591
|
+
# Auto-detect source from database.yml (works in Docker dev with zero env vars)
|
|
592
|
+
config.source_database_url = :auto unless Rails.env.production?
|
|
593
|
+
|
|
594
|
+
# Or set explicitly
|
|
595
|
+
# config.source_database_url = ENV['DATABASE_URL']
|
|
596
|
+
|
|
597
|
+
config.target_database_url = ENV['SCRUBBED_DATABASE_URL']
|
|
598
|
+
config.export_path = "tmp/scrubbed_#{Date.today}.dump"
|
|
599
|
+
config.export_format = :custom # :custom (pg_dump -Fc) or :plain (SQL)
|
|
600
|
+
end
|
|
601
|
+
```
|
|
602
|
+
|
|
603
|
+
When `source_database_url` is `:auto`, Pumice derives the URL from `ActiveRecord::Base.connection_db_config`. This means `rake db:scrub:generate` works locally with no env vars.
|
|
604
|
+
|
|
605
|
+
Environment variables (`SOURCE_DATABASE_URL`) always take precedence over config.
|
|
606
|
+
|
|
607
|
+
### Safety guarantees
|
|
608
|
+
|
|
609
|
+
- Source database is **never modified** — read-only access
|
|
610
|
+
- Target cannot equal `DATABASE_URL` — prevents accidental production writes
|
|
611
|
+
- Source and target must differ — validated at startup
|
|
612
|
+
- Interactive confirmation — must type the target DB name
|
|
613
|
+
- Write-access detection — warns (or errors) if source credentials can write
|
|
614
|
+
|
|
615
|
+
### Read-only source credentials (recommended)
|
|
616
|
+
|
|
617
|
+
```sql
|
|
618
|
+
-- On source (production): read-only
|
|
619
|
+
CREATE ROLE pumice_readonly WITH LOGIN PASSWORD 'readonly_secret';
|
|
620
|
+
GRANT CONNECT ON DATABASE myapp_production TO pumice_readonly;
|
|
621
|
+
GRANT USAGE ON SCHEMA public TO pumice_readonly;
|
|
622
|
+
GRANT SELECT ON ALL TABLES IN SCHEMA public TO pumice_readonly;
|
|
623
|
+
|
|
624
|
+
-- On target: full access
|
|
625
|
+
CREATE ROLE pumice_writer WITH LOGIN PASSWORD 'writer_secret';
|
|
626
|
+
CREATE DATABASE myapp_scrubbed OWNER pumice_writer;
|
|
627
|
+
```
|
|
628
|
+
|
|
629
|
+
```bash
|
|
630
|
+
SOURCE_DATABASE_URL=postgres://pumice_readonly:readonly_secret@prod-host/myapp_production
|
|
631
|
+
TARGET_DATABASE_URL=postgres://pumice_writer:writer_secret@scrub-host/myapp_scrubbed
|
|
632
|
+
```
|
|
633
|
+
|
|
634
|
+
Even if URLs are swapped, the read-only credential cannot modify production.
|
|
635
|
+
|
|
636
|
+
To enforce read-only source (error instead of warning):
|
|
637
|
+
|
|
638
|
+
```ruby
|
|
639
|
+
config.require_readonly_source = true
|
|
640
|
+
```
|
|
641
|
+
|
|
642
|
+
### CI mode
|
|
643
|
+
|
|
644
|
+
```bash
|
|
645
|
+
# Auto-confirmed — argument must match target DB name or the task fails
|
|
646
|
+
rake 'db:scrub:safe_confirmed[myapp_scrubbed]'
|
|
647
|
+
```
|
|
648
|
+
|
|
649
|
+
### Programmatic usage
|
|
650
|
+
|
|
651
|
+
```ruby
|
|
652
|
+
Pumice::SafeScrubber.new(
|
|
653
|
+
source_url: ENV['DATABASE_URL'],
|
|
654
|
+
target_url: ENV['SCRUBBED_DATABASE_URL'],
|
|
655
|
+
export_path: 'tmp/scrubbed.dump',
|
|
656
|
+
confirm: true # skip interactive prompt
|
|
657
|
+
).run
|
|
658
|
+
```
|
|
659
|
+
|
|
660
|
+
### Error types
|
|
661
|
+
|
|
662
|
+
| Error | Cause |
|
|
663
|
+
|---|---|
|
|
664
|
+
| `Pumice::ConfigurationError` | Missing URL, source = target, target = DATABASE_URL, confirmation mismatch |
|
|
665
|
+
| `Pumice::SourceWriteAccessError` | `require_readonly_source = true` and source has write access |
|
|
666
|
+
|
|
667
|
+
---
|
|
668
|
+
|
|
669
|
+
## Pruning
|
|
670
|
+
|
|
671
|
+
Removes old records before sanitization to reduce dataset size. Useful for log tables, audit trails, and event streams.
|
|
672
|
+
|
|
673
|
+
Pumice supports pruning at two levels with a **cascading override** model:
|
|
674
|
+
|
|
675
|
+
- **Global pruning** — configured once in the initializer. Applies a single age-based rule across many tables at once, before any sanitizers run. This is the default policy.
|
|
676
|
+
- **Per-sanitizer `prune`** — defined inside a sanitizer with a custom scope. **Overrides** global pruning for that table. See [`prune` in the Sanitizer DSL](#prune-pre-step-not-terminal).
|
|
677
|
+
|
|
678
|
+
When a sanitizer defines its own `prune`, global pruning skips that table entirely — the sanitizer's prune takes over. Use global pruning for a blanket retention policy and per-sanitizer `prune` to override specific tables with custom scopes.
|
|
679
|
+
|
|
680
|
+
### Analyze first
|
|
681
|
+
|
|
682
|
+
```bash
|
|
683
|
+
rake db:prune:analyze
|
|
684
|
+
|
|
685
|
+
# Customize thresholds
|
|
686
|
+
RETENTION_DAYS=30 MIN_SIZE=50000000 MIN_ROWS=5000 rake db:prune:analyze
|
|
687
|
+
```
|
|
688
|
+
|
|
689
|
+
The analyzer categorizes tables by confidence:
|
|
690
|
+
|
|
691
|
+
- **High**: Log tables, >50% old records, no foreign key dependencies
|
|
692
|
+
- **Medium**: Log tables OR >70% old, no dependencies
|
|
693
|
+
- **Low**: Everything else — review before pruning
|
|
694
|
+
|
|
695
|
+
### Global pruning configuration
|
|
696
|
+
|
|
697
|
+
```ruby
|
|
698
|
+
Pumice.configure do |config|
|
|
699
|
+
config.pruning = {
|
|
700
|
+
older_than: 90.days, # required (mutually exclusive with newer_than)
|
|
701
|
+
column: :created_at, # default
|
|
702
|
+
except: %w[users messages], # never prune these (mutually exclusive with only)
|
|
703
|
+
|
|
704
|
+
analyzer: {
|
|
705
|
+
table_patterns: %w[portal_session voice_log], # domain-specific log patterns
|
|
706
|
+
min_table_size: 10_000_000, # 10 MB (default)
|
|
707
|
+
min_row_count: 1000 # default
|
|
708
|
+
}
|
|
709
|
+
}
|
|
710
|
+
end
|
|
711
|
+
```
|
|
712
|
+
|
|
713
|
+
### Execution order
|
|
714
|
+
|
|
715
|
+
```
|
|
716
|
+
1. Global prune → delete old records from all eligible tables
|
|
717
|
+
(tables with a sanitizer-level prune are skipped)
|
|
718
|
+
|
|
719
|
+
2. Sanitizers → for each sanitizer, in order:
|
|
720
|
+
a. run sanitizer-level prune, if defined
|
|
721
|
+
b. scrub surviving records
|
|
722
|
+
```
|
|
723
|
+
|
|
724
|
+
The sanitizer-level `prune` replaces global pruning for that table — they never both run on the same table.
|
|
725
|
+
|
|
726
|
+
### Disable at runtime
|
|
727
|
+
|
|
728
|
+
```bash
|
|
729
|
+
PRUNE=false rake db:scrub:generate
|
|
730
|
+
```
|
|
731
|
+
|
|
732
|
+
---
|
|
733
|
+
|
|
734
|
+
## Soft Scrubbing
|
|
735
|
+
|
|
736
|
+
Masks data at read time without modifying the database. Use for runtime access control — e.g., non-admin users see scrubbed PII, admins see real data.
|
|
737
|
+
|
|
738
|
+
### Enable
|
|
739
|
+
|
|
740
|
+
```ruby
|
|
741
|
+
Pumice.configure do |config|
|
|
742
|
+
config.soft_scrubbing = {
|
|
743
|
+
context: :current_user,
|
|
744
|
+
if: ->(record, viewer) { viewer.nil? || !viewer.admin? }
|
|
745
|
+
}
|
|
746
|
+
end
|
|
747
|
+
```
|
|
748
|
+
|
|
749
|
+
When enabled, Pumice prepends an attribute interceptor on `ActiveRecord::Base`. On attribute read, the policy is checked. If it returns true, the `scrub` block runs and the scrubbed value is returned. The database is never modified.
|
|
750
|
+
|
|
751
|
+
### Policy options
|
|
752
|
+
|
|
753
|
+
| Option | Behavior |
|
|
754
|
+
|---|---|
|
|
755
|
+
| `if:` | Scrub when lambda returns **true** |
|
|
756
|
+
| `unless:` | Scrub when lambda returns **false** |
|
|
757
|
+
| Neither | Always scrub |
|
|
758
|
+
|
|
759
|
+
Both receive `(record, viewer)`. They are mutually exclusive — `if:` takes precedence.
|
|
760
|
+
|
|
761
|
+
### Setting viewer context
|
|
762
|
+
|
|
763
|
+
```ruby
|
|
764
|
+
# In ApplicationController
|
|
765
|
+
before_action { Pumice.soft_scrubbing_context = current_user }
|
|
766
|
+
|
|
767
|
+
# Or scoped
|
|
768
|
+
Pumice.with_soft_scrubbing_context(current_user) do
|
|
769
|
+
@users = User.all # reads scrubbed for non-admins
|
|
770
|
+
end
|
|
771
|
+
```
|
|
772
|
+
|
|
773
|
+
The `context:` config option resolves a Symbol through: `record.method` → `Pumice.method` → `Current.method` → `Thread.current[:key]`.
|
|
774
|
+
|
|
775
|
+
### Accessing original values
|
|
776
|
+
|
|
777
|
+
When soft scrubbing is enabled, attribute reads return scrubbed values. To access the original database value:
|
|
778
|
+
|
|
779
|
+
**Inside sanitizer definitions** — `raw(:*)` and `raw_attr` are available via the sanitizer DSL (see [Referencing other attributes](#referencing-other-attributes-in-scrub-blocks)).
|
|
780
|
+
|
|
781
|
+
**Inside ActiveRecord models** — use `read_attribute(:attr)` or define a helper:
|
|
782
|
+
|
|
783
|
+
```ruby
|
|
784
|
+
class User < ApplicationRecord
|
|
785
|
+
def admin?
|
|
786
|
+
ADMIN_EMAILS.include?(read_attribute(:email))
|
|
787
|
+
end
|
|
788
|
+
|
|
789
|
+
# Or define a convenience method:
|
|
790
|
+
def raw(attr_name)
|
|
791
|
+
if Pumice.soft_scrubbing?
|
|
792
|
+
read_attribute(attr_name)
|
|
793
|
+
else
|
|
794
|
+
@attributes.fetch_value(attr_name.to_s)
|
|
795
|
+
end
|
|
796
|
+
end
|
|
797
|
+
end
|
|
798
|
+
```
|
|
799
|
+
|
|
800
|
+
---
|
|
801
|
+
|
|
802
|
+
## Testing
|
|
803
|
+
|
|
804
|
+
### Setup
|
|
805
|
+
|
|
806
|
+
```ruby
|
|
807
|
+
# spec/rails_helper.rb
|
|
808
|
+
require 'pumice/rspec'
|
|
809
|
+
```
|
|
810
|
+
|
|
811
|
+
This gives you:
|
|
812
|
+
|
|
813
|
+
- **Auto-reset** — `Pumice.reset!` runs before each `type: :sanitizer` spec
|
|
814
|
+
- **Auto-lint** — column coverage is verified automatically; incomplete sanitizers fail before examples run
|
|
815
|
+
- **Path inference** — specs in `spec/sanitizers/` are automatically tagged `type: :sanitizer`
|
|
816
|
+
- **Helpers** — `with_soft_scrubbing` and `without_soft_scrubbing` available in sanitizer specs
|
|
817
|
+
- **Matchers** — `have_scrubbed(:attr)` and `have_kept(:attr)` for verifying sanitizer definitions
|
|
818
|
+
|
|
819
|
+
### Sanitizer specs
|
|
820
|
+
|
|
821
|
+
```ruby
|
|
822
|
+
# spec/sanitizers/user_sanitizer_spec.rb
|
|
823
|
+
RSpec.describe UserSanitizer, type: :sanitizer do
|
|
824
|
+
let(:user) { create(:user, email: 'real@gmail.com', first_name: 'John') }
|
|
825
|
+
|
|
826
|
+
# Column coverage is checked automatically — no need to add a lint test.
|
|
827
|
+
|
|
828
|
+
describe '.sanitize' do
|
|
829
|
+
it 'returns sanitized values without persisting' do
|
|
830
|
+
result = described_class.sanitize(user)
|
|
831
|
+
|
|
832
|
+
expect(result[:email]).to match(/user_\d+@example\.test/)
|
|
833
|
+
expect(user.reload.email).to eq('real@gmail.com')
|
|
834
|
+
end
|
|
835
|
+
end
|
|
836
|
+
|
|
837
|
+
describe '.scrub!' do
|
|
838
|
+
it 'persists sanitized values' do
|
|
839
|
+
described_class.scrub!(user)
|
|
840
|
+
expect(user.reload.email).to match(/user_\d+@example\.test/)
|
|
841
|
+
end
|
|
842
|
+
end
|
|
843
|
+
end
|
|
844
|
+
```
|
|
845
|
+
|
|
846
|
+
To skip auto-lint for a specific sanitizer (e.g., during initial development):
|
|
847
|
+
|
|
848
|
+
```ruby
|
|
849
|
+
RSpec.describe UserSanitizer, type: :sanitizer, lint: false do
|
|
850
|
+
# ...
|
|
851
|
+
end
|
|
852
|
+
```
|
|
853
|
+
|
|
854
|
+
### Soft scrubbing specs
|
|
855
|
+
|
|
856
|
+
```ruby
|
|
857
|
+
RSpec.describe 'User soft scrubbing', type: :sanitizer do
|
|
858
|
+
let(:user) { create(:user, email: 'real@gmail.com') }
|
|
859
|
+
let(:admin) { create(:user, :admin) }
|
|
860
|
+
let(:regular) { create(:user) }
|
|
861
|
+
|
|
862
|
+
it 'scrubs for non-admins' do
|
|
863
|
+
with_soft_scrubbing(viewer: regular, if: ->(r, v) { !v.admin? }) do
|
|
864
|
+
expect(user.email).to match(/user_\d+@example\.test/)
|
|
865
|
+
end
|
|
866
|
+
end
|
|
867
|
+
|
|
868
|
+
it 'shows real data to admins' do
|
|
869
|
+
with_soft_scrubbing(viewer: admin, if: ->(r, v) { !v.admin? }) do
|
|
870
|
+
expect(user.email).to eq('real@gmail.com')
|
|
871
|
+
end
|
|
872
|
+
end
|
|
873
|
+
end
|
|
874
|
+
```
|
|
875
|
+
|
|
876
|
+
### Helpers reference
|
|
877
|
+
|
|
878
|
+
| Helper | Use |
|
|
879
|
+
|---|---|
|
|
880
|
+
| `with_soft_scrubbing(viewer:, if:, unless:)` | Enable soft scrubbing for a block |
|
|
881
|
+
| `without_soft_scrubbing { ... }` | Disable soft scrubbing for a block |
|
|
882
|
+
| `have_scrubbed(:attr)` | Assert a sanitizer defines a scrub rule for `:attr` |
|
|
883
|
+
| `have_kept(:attr)` | Assert a sanitizer marks `:attr` as kept |
|
|
884
|
+
|
|
885
|
+
Both soft scrubbing helpers restore original config after the block, even on error.
|
|
886
|
+
|
|
887
|
+
```ruby
|
|
888
|
+
# Matcher examples
|
|
889
|
+
RSpec.describe UserSanitizer, type: :sanitizer do
|
|
890
|
+
it { expect(described_class).to have_scrubbed(:email) }
|
|
891
|
+
it { expect(described_class).to have_scrubbed(:first_name) }
|
|
892
|
+
it { expect(described_class).to have_kept(:role) }
|
|
893
|
+
end
|
|
894
|
+
```
|
|
895
|
+
|
|
896
|
+
---
|
|
897
|
+
|
|
898
|
+
## Materialized Views
|
|
899
|
+
|
|
900
|
+
Pumice includes rake tasks for managing materialized views, which are relevant during safe scrub since view data is excluded from dumps by default.
|
|
901
|
+
|
|
902
|
+
```bash
|
|
903
|
+
rake db:matviews:list # list all materialized views with sizes
|
|
904
|
+
rake db:matviews:refresh # refresh all materialized views
|
|
905
|
+
rake 'db:matviews:refresh[view1,view2]' # refresh specific views
|
|
906
|
+
```
|
|
907
|
+
|
|
908
|
+
After restoring a scrubbed dump, refresh materialized views to rebuild their data:
|
|
909
|
+
|
|
910
|
+
```bash
|
|
911
|
+
pg_restore -d myapp_dev tmp/scrubbed.dump && rake db:matviews:refresh
|
|
912
|
+
```
|
|
913
|
+
|
|
914
|
+
Set `EXCLUDE_MATVIEWS=false` to include materialized view data in the dump (skipping the need to refresh after restore).
|
|
915
|
+
|
|
916
|
+
---
|
|
917
|
+
|
|
918
|
+
## Gotchas
|
|
919
|
+
|
|
920
|
+
### Strict mode and new columns
|
|
921
|
+
|
|
922
|
+
When `strict: true` (default), adding a column to a model without updating its sanitizer will raise an error on next scrub. Run `rake db:scrub:lint` in CI to catch this early.
|
|
923
|
+
|
|
924
|
+
### Bulk operations skip column validation
|
|
925
|
+
|
|
926
|
+
`truncate!`, `delete_all`, and `destroy_all` don't require `scrub`/`keep` declarations. Strict mode doesn't apply to them.
|
|
927
|
+
|
|
928
|
+
### Faker seeding
|
|
929
|
+
|
|
930
|
+
Pumice seeds Faker with `record.id` before each record. This makes scrubbing **deterministic** — the same record always produces the same fake values. Important for consistency across runs.
|
|
931
|
+
|
|
932
|
+
### Protected columns
|
|
933
|
+
|
|
934
|
+
`id`, `created_at`, and `updated_at` are automatically excluded from column coverage checks. You never need to declare them.
|
|
935
|
+
|
|
936
|
+
### Soft scrubbing circular dependency
|
|
937
|
+
|
|
938
|
+
If your policy check reads a scrubbed attribute (e.g., `viewer.admin?` checks `viewer.email`), use `read_attribute(:email)` instead. Without this, the policy triggers scrubbing, which triggers the policy — infinite loop. Pumice includes a recursion guard that falls through to `super` (the real value) on re-entry, so the app won't crash, but `read_attribute()` makes the intent explicit.
|
|
939
|
+
|
|
940
|
+
### `source_database_url = :auto`
|
|
941
|
+
|
|
942
|
+
Only works with PostgreSQL. Builds a URL from `ActiveRecord::Base.connection_db_config` components. Returns `nil` for non-PostgreSQL adapters.
|
|
943
|
+
|
|
944
|
+
### Pruning mutual exclusivity
|
|
945
|
+
|
|
946
|
+
- `older_than` and `newer_than` cannot both be set — raises `ArgumentError`
|
|
947
|
+
- `only` and `except` cannot both be set — they are mutually exclusive
|
|
948
|
+
- One of `older_than` or `newer_than` is required
|
|
949
|
+
|
|
950
|
+
### Global pruning and foreign keys
|
|
951
|
+
|
|
952
|
+
The global pruner skips tables with foreign key dependencies and logs a warning. Per-sanitizer `prune` does **not** check dependencies — that's on you.
|
|
953
|
+
|
|
954
|
+
### Safe scrub connection management
|
|
955
|
+
|
|
956
|
+
Safe Scrub temporarily changes `ActiveRecord::Base.connection_db_config` to operate on the target. It always restores the original connection, even on error. Existing connections to the target are terminated before DROP/CREATE.
|
|
957
|
+
|
|
958
|
+
---
|
|
959
|
+
|
|
960
|
+
## License
|
|
961
|
+
|
|
962
|
+
MIT
|