pg_dump_anonymize 0.1.0 → 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 903be54f2c5727d77dc205f98565863caa7977639c6302a47d9a52f7cdcac15e
4
- data.tar.gz: 244fe8be50f46e717bddf6222f46360b2cff2be016285f101a0f9698a10bed6e
3
+ metadata.gz: 56f41e435e9420140a7969c4d536a6a517836ca5d8d06e5be8bfca127fc939f0
4
+ data.tar.gz: 016d7a8b99e9f67046e7192856a13eb9bcf5ab018ce7cd61a00ed5b3375783b0
5
5
  SHA512:
6
- metadata.gz: 1d4a8ef4dc0050fd1fd942e3c4925bfe0a1cbf5145b6b643fc1188e0479fcfb5262e04d90cfae9715ac6f0f3060757ae4da21eb0ea7c72cb6785f7b39290e183
7
- data.tar.gz: 0c8f53ddcf4d46c5ea49426059419848aa97205579e99ab7a44eed5f6143bf603c994ccd52a58f3928b2171204e18e33648b95b0a9234e96dae9bebb02eaae78
6
+ metadata.gz: a69b3a188689de3d2636a28ee45e7b7902f5ebea3292f675ff7283c834ff67b6c3550fb3b979bfee4487197eb20c5d71773b92b255023476663824b7aede7ceb
7
+ data.tar.gz: e04e32850bb4ad0ea99e426e5e6fb222773ea570cac6b305527fe567b6bc80e6db81158a5bcc6ce9b05c49d5b224bf8f87fa4ba2624cf5425f34e277145450a4
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- pg_dump_anonymize (0.1.0)
4
+ pg_dump_anonymize (0.1.1)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
data/README.md CHANGED
@@ -1,7 +1,9 @@
1
- # PgDumpAnonymize
1
+ # pg_dump_anonymize
2
2
 
3
3
  Anonymizing pg_dump data isn't always straight forward. This tool helps. It has no dependencies, other than ruby. Another good thing is that sensitive data doesn't need to ever be stored temporarily as this can be used with Unix pipes and the data is passed through as IO data.
4
4
 
5
+ `pg_dump_anonymize` does not anonymize any data automatically. It is very much BYOAD (Bring Your Own Anonymizing Definition). Inside your anonymizing definition, you can use any ruby gems you like, such as [faker](https://github.com/faker-ruby/faker). This gem makes applying anonymizing definitions to `pg_dump` output easy.
6
+
5
7
  ## Installation
6
8
 
7
9
  Add this line to your application's Gemfile:
@@ -20,15 +22,19 @@ Or install it yourself as:
20
22
 
21
23
  ## Usage
22
24
 
25
+ `pg_dump_anonymize` does not anonymize any data by default. You must provide your own anonymizing definition. The gem currently requires the format of output `pg_dump` is plain text (not compressed) and uses the default `COPY` behavior instead of `INSERT INTO`.
26
+
23
27
  Example usage:
24
28
 
25
29
  ```
26
30
  pg_dump --no-privileges --no-owner <database-name> | pg_dump_anonymize -d sample_definition.rb > anonymized_dump.sql
27
31
  ```
28
32
 
33
+ You can also pipe the anonymized SQL directly into `psql` to avoid intermediate SQL dump files.
34
+
29
35
  ### Definition File
30
36
 
31
- The definition file is any ruby code you'd like, but it should return a hash with table names as the top level keys and attribute names nested under the table keys. Any faking ruby gems dependencies, and other gem dependencies you may choose to use in your definitions, are gems that will need to be available on the server this is ran on.
37
+ The definition file is any ruby code you'd like, but it must return a hash with table names as the top level keys and attribute names nested under the table keys. Any faking ruby gems dependencies, and other gem dependencies you may choose to use in your definitions, are gems that will need to be available on the server this is ran on.
32
38
 
33
39
  Example:
34
40
 
@@ -37,6 +43,14 @@ Example:
37
43
  table_1: {
38
44
  sensitive_field_1: 'some string value',
39
45
  sensitive_field_2: -> { rand(100) },
46
+ sensitive_field_3: -> (original_value, row_context) do
47
+ if original_value
48
+ 'xxxxx'
49
+ end.tap { |new_val| row_context[field_3_val] = new_val }
50
+ end
51
+ sensitive_field_4: -> (_original_value, row_context) do
52
+ row_context[:field_3_val] ? 'foo' : 'bar'
53
+ end
40
54
  },
41
55
  table_2: {
42
56
  sensitive_field_3: nil
@@ -54,7 +68,7 @@ require 'faker'
54
68
  first_name: -> { Faker::Name.first_name },
55
69
  last_name: -> { Faker::Name.last_name },
56
70
  email: -> { Faker::Internet.email },
57
- city: -> 'Portland'
71
+ city: 'Portland'
58
72
  },
59
73
  accounts: {
60
74
  bank_name: -> { Faker::Bank.name },
@@ -64,6 +78,9 @@ require 'faker'
64
78
  }
65
79
  ```
66
80
 
81
+ ## Todo
82
+ - [ ] Write some tests (so far this has been tested manually)
83
+
67
84
  ## Development
68
85
 
69
86
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
@@ -27,9 +27,13 @@ module PgDumpAnonymize
27
27
  # This assumes the line is a tab delimited data line
28
28
  def anonymize_line(line)
29
29
  values = line.split("\t")
30
+ row_context = {} # used to share state for a row
30
31
  @positional_substitutions.each do |index, val_def|
31
- anonymous_value = val_def.is_a?(Proc) ? val_def.call : val_def
32
- values[index] = anonymous_value
32
+ values[index] = if val_def.is_a?(Proc)
33
+ val_def.call(*[values[index], row_context].slice(0, val_def.arity))
34
+ else
35
+ val_def
36
+ end
33
37
  end
34
38
  values.join("\t")
35
39
  end
@@ -1,3 +1,3 @@
1
1
  module PgDumpAnonymize
2
- VERSION = "0.1.0"
2
+ VERSION = "0.1.1"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pg_dump_anonymize
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sean McCleary
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2020-10-05 00:00:00.000000000 Z
11
+ date: 2020-12-30 00:00:00.000000000 Z
12
12
  dependencies: []
13
13
  description: Given the default pg_dump text SQL dump, this can take a simple definition
14
14
  of tables and fields to anonymize and efficiently anonymize the dump.