pg_dump_anonymize 0.1.0 → 0.1.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Gemfile.lock +1 -1
- data/README.md +20 -3
- data/lib/pg_dump_anonymize/definition.rb +6 -2
- data/lib/pg_dump_anonymize/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 56f41e435e9420140a7969c4d536a6a517836ca5d8d06e5be8bfca127fc939f0
|
4
|
+
data.tar.gz: 016d7a8b99e9f67046e7192856a13eb9bcf5ab018ce7cd61a00ed5b3375783b0
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a69b3a188689de3d2636a28ee45e7b7902f5ebea3292f675ff7283c834ff67b6c3550fb3b979bfee4487197eb20c5d71773b92b255023476663824b7aede7ceb
|
7
|
+
data.tar.gz: e04e32850bb4ad0ea99e426e5e6fb222773ea570cac6b305527fe567b6bc80e6db81158a5bcc6ce9b05c49d5b224bf8f87fa4ba2624cf5425f34e277145450a4
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -1,7 +1,9 @@
|
|
1
|
-
#
|
1
|
+
# pg_dump_anonymize
|
2
2
|
|
3
3
|
Anonymizing pg_dump data isn't always straight forward. This tool helps. It has no dependencies, other than ruby. Another good thing is that sensitive data doesn't need to ever be stored temporarily as this can be used with Unix pipes and the data is passed through as IO data.
|
4
4
|
|
5
|
+
`pg_dump_anonymize` does not anonymize any data automatically. It is very much BYOAD (Bring Your Own Anonymizing Definition). Inside your anonymizing definition, you can use any ruby gems you like, such as [faker](https://github.com/faker-ruby/faker). This gem makes applying anonymizing definitions to `pg_dump` output easy.
|
6
|
+
|
5
7
|
## Installation
|
6
8
|
|
7
9
|
Add this line to your application's Gemfile:
|
@@ -20,15 +22,19 @@ Or install it yourself as:
|
|
20
22
|
|
21
23
|
## Usage
|
22
24
|
|
25
|
+
`pg_dump_anonymize` does not anonymize any data by default. You must provide your own anonymizing definition. The gem currently requires the format of output `pg_dump` is plain text (not compressed) and uses the default `COPY` behavior instead of `INSERT INTO`.
|
26
|
+
|
23
27
|
Example usage:
|
24
28
|
|
25
29
|
```
|
26
30
|
pg_dump --no-privileges --no-owner <database-name> | pg_dump_anonymize -d sample_definition.rb > anonymized_dump.sql
|
27
31
|
```
|
28
32
|
|
33
|
+
You can also pipe the anonymized SQL directly into `psql` to avoid intermediate SQL dump files.
|
34
|
+
|
29
35
|
### Definition File
|
30
36
|
|
31
|
-
The definition file is any ruby code you'd like, but it
|
37
|
+
The definition file is any ruby code you'd like, but it must return a hash with table names as the top level keys and attribute names nested under the table keys. Any faking ruby gems dependencies, and other gem dependencies you may choose to use in your definitions, are gems that will need to be available on the server this is ran on.
|
32
38
|
|
33
39
|
Example:
|
34
40
|
|
@@ -37,6 +43,14 @@ Example:
|
|
37
43
|
table_1: {
|
38
44
|
sensitive_field_1: 'some string value',
|
39
45
|
sensitive_field_2: -> { rand(100) },
|
46
|
+
sensitive_field_3: -> (original_value, row_context) do
|
47
|
+
if original_value
|
48
|
+
'xxxxx'
|
49
|
+
end.tap { |new_val| row_context[field_3_val] = new_val }
|
50
|
+
end
|
51
|
+
sensitive_field_4: -> (_original_value, row_context) do
|
52
|
+
row_context[:field_3_val] ? 'foo' : 'bar'
|
53
|
+
end
|
40
54
|
},
|
41
55
|
table_2: {
|
42
56
|
sensitive_field_3: nil
|
@@ -54,7 +68,7 @@ require 'faker'
|
|
54
68
|
first_name: -> { Faker::Name.first_name },
|
55
69
|
last_name: -> { Faker::Name.last_name },
|
56
70
|
email: -> { Faker::Internet.email },
|
57
|
-
city:
|
71
|
+
city: 'Portland'
|
58
72
|
},
|
59
73
|
accounts: {
|
60
74
|
bank_name: -> { Faker::Bank.name },
|
@@ -64,6 +78,9 @@ require 'faker'
|
|
64
78
|
}
|
65
79
|
```
|
66
80
|
|
81
|
+
## Todo
|
82
|
+
- [ ] Write some tests (so far this has been tested manually)
|
83
|
+
|
67
84
|
## Development
|
68
85
|
|
69
86
|
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
@@ -27,9 +27,13 @@ module PgDumpAnonymize
|
|
27
27
|
# This assumes the line is a tab delimited data line
|
28
28
|
def anonymize_line(line)
|
29
29
|
values = line.split("\t")
|
30
|
+
row_context = {} # used to share state for a row
|
30
31
|
@positional_substitutions.each do |index, val_def|
|
31
|
-
|
32
|
-
|
32
|
+
values[index] = if val_def.is_a?(Proc)
|
33
|
+
val_def.call(*[values[index], row_context].slice(0, val_def.arity))
|
34
|
+
else
|
35
|
+
val_def
|
36
|
+
end
|
33
37
|
end
|
34
38
|
values.join("\t")
|
35
39
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: pg_dump_anonymize
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Sean McCleary
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2020-
|
11
|
+
date: 2020-12-30 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
13
|
description: Given the default pg_dump text SQL dump, this can take a simple definition
|
14
14
|
of tables and fields to anonymize and efficiently anonymize the dump.
|