pg_dump_anonymize 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 903be54f2c5727d77dc205f98565863caa7977639c6302a47d9a52f7cdcac15e
4
+ data.tar.gz: 244fe8be50f46e717bddf6222f46360b2cff2be016285f101a0f9698a10bed6e
5
+ SHA512:
6
+ metadata.gz: 1d4a8ef4dc0050fd1fd942e3c4925bfe0a1cbf5145b6b643fc1188e0479fcfb5262e04d90cfae9715ac6f0f3060757ae4da21eb0ea7c72cb6785f7b39290e183
7
+ data.tar.gz: 0c8f53ddcf4d46c5ea49426059419848aa97205579e99ab7a44eed5f6143bf603c994ccd52a58f3928b2171204e18e33648b95b0a9234e96dae9bebb02eaae78
@@ -0,0 +1,11 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+
10
+ # rspec failure tracking
11
+ .rspec_status
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
@@ -0,0 +1,6 @@
1
+ ---
2
+ language: ruby
3
+ cache: bundler
4
+ rvm:
5
+ - 2.7.1
6
+ before_install: gem install bundler -v 2.1.4
data/Gemfile ADDED
@@ -0,0 +1,7 @@
1
+ source "https://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in pg_dump_anonymize.gemspec
4
+ gemspec
5
+
6
+ gem "rake", "~> 12.0"
7
+ gem "rspec", "~> 3.0"
@@ -0,0 +1,34 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ pg_dump_anonymize (0.1.0)
5
+
6
+ GEM
7
+ remote: https://rubygems.org/
8
+ specs:
9
+ diff-lcs (1.4.4)
10
+ rake (12.3.3)
11
+ rspec (3.9.0)
12
+ rspec-core (~> 3.9.0)
13
+ rspec-expectations (~> 3.9.0)
14
+ rspec-mocks (~> 3.9.0)
15
+ rspec-core (3.9.3)
16
+ rspec-support (~> 3.9.3)
17
+ rspec-expectations (3.9.2)
18
+ diff-lcs (>= 1.2.0, < 2.0)
19
+ rspec-support (~> 3.9.0)
20
+ rspec-mocks (3.9.1)
21
+ diff-lcs (>= 1.2.0, < 2.0)
22
+ rspec-support (~> 3.9.0)
23
+ rspec-support (3.9.3)
24
+
25
+ PLATFORMS
26
+ ruby
27
+
28
+ DEPENDENCIES
29
+ pg_dump_anonymize!
30
+ rake (~> 12.0)
31
+ rspec (~> 3.0)
32
+
33
+ BUNDLED WITH
34
+ 2.1.4
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2020 Sean McCleary
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,84 @@
1
+ # PgDumpAnonymize
2
+
3
+ Anonymizing pg_dump data isn't always straight forward. This tool helps. It has no dependencies, other than ruby. Another good thing is that sensitive data doesn't need to ever be stored temporarily as this can be used with Unix pipes and the data is passed through as IO data.
4
+
5
+ ## Installation
6
+
7
+ Add this line to your application's Gemfile:
8
+
9
+ ```ruby
10
+ gem 'pg_dump_anonymize'
11
+ ```
12
+
13
+ And then execute:
14
+
15
+ $ bundle install
16
+
17
+ Or install it yourself as:
18
+
19
+ $ gem install pg_dump_anonymize
20
+
21
+ ## Usage
22
+
23
+ Example usage:
24
+
25
+ ```
26
+ pg_dump --no-privileges --no-owner <database-name> | pg_dump_anonymize -d sample_definition.rb > anonymized_dump.sql
27
+ ```
28
+
29
+ ### Definition File
30
+
31
+ The definition file is any ruby code you'd like, but it should return a hash with table names as the top level keys and attribute names nested under the table keys. Any faking ruby gems dependencies, and other gem dependencies you may choose to use in your definitions, are gems that will need to be available on the server this is ran on.
32
+
33
+ Example:
34
+
35
+ ```ruby
36
+ {
37
+ table_1: {
38
+ sensitive_field_1: 'some string value',
39
+ sensitive_field_2: -> { rand(100) },
40
+ },
41
+ table_2: {
42
+ sensitive_field_3: nil
43
+ }
44
+ }
45
+ ```
46
+
47
+ Here is a more concrete example using the [faker gem](https://github.com/faker-ruby/faker).
48
+
49
+ ```ruby
50
+ require 'faker'
51
+
52
+ {
53
+ users: {
54
+ first_name: -> { Faker::Name.first_name },
55
+ last_name: -> { Faker::Name.last_name },
56
+ email: -> { Faker::Internet.email },
57
+ city: -> 'Portland'
58
+ },
59
+ accounts: {
60
+ bank_name: -> { Faker::Bank.name },
61
+ account_num: -> { Faker::Bank.account_number },
62
+ routing_num: -> { Faker::Bank.routing_number }
63
+ }
64
+ }
65
+ ```
66
+
67
+ ## Development
68
+
69
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
70
+
71
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
72
+
73
+ ## Performance
74
+
75
+ This has been tested with a 1GB dump file, and it took 11 seconds on a 2013 MacBook Pro to anonymize it. It largely depends on how much you're anonymizing, and the anonymizing definitions you're applying. So milage will vary. Still, this is plenty fast for most of my needs.
76
+
77
+ ## Contributing
78
+
79
+ Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/pg_dump_anonymize.
80
+
81
+
82
+ ## License
83
+
84
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "pg_dump_anonymize"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,35 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ $LOAD_PATH << File.expand_path('../lib', __dir__)
4
+ require 'pg_dump_anonymize'
5
+ require 'optparse'
6
+
7
+ options = {}
8
+ OptionParser.new do |opts|
9
+ opts.banner = "Usage: #{File.basename(__FILE__)} [options]"
10
+
11
+ opts.on('-d', '--definition DEFINITION_FILE', 'Definition file to read. This is required') do |v|
12
+ options[:definition_file_path] = v
13
+ end
14
+
15
+ opts.on('-f', '--file OUTPUT_FILE', 'Output file') do |v|
16
+ options[:output_file_path] = v
17
+ end
18
+ end.parse!
19
+
20
+ def_file = options[:definition_file_path]
21
+
22
+ unless def_file && File.exist?(def_file)
23
+ puts 'Definition file not found'
24
+ puts 'See usage with --help'
25
+ exit 1
26
+ end
27
+
28
+ output_file = options[:output_file_path]
29
+ output = if output_file
30
+ File.open(output_file, 'w')
31
+ else
32
+ STDOUT
33
+ end
34
+
35
+ PgDumpAnonymize.anonymize(def_file, STDIN, output)
@@ -0,0 +1,17 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'pg_dump_anonymize/version'
4
+ require 'pg_dump_anonymize/definition'
5
+
6
+ module PgDumpAnonymize
7
+ class Error < StandardError; end
8
+
9
+ def self.anonymize(definitions_file_path, input_io, output_io)
10
+ definitions_hash = eval(File.open(definitions_file_path).read) # rubocop:disable Security/Eval
11
+ definitions = Definition.new(definitions_hash)
12
+
13
+ input_io.each_line do |line|
14
+ output_io.write definitions.process_line(line)
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,77 @@
1
+ # frozen_string_literal: true
2
+
3
+ module PgDumpAnonymize
4
+ # This is used to define rules and apply the rules when parsing the dump sql file
5
+ class Definition
6
+ def initialize(attribute_rules)
7
+ @attribute_rules = attribute_rules
8
+ @current_table = nil
9
+ @positional_substitutions = nil
10
+ end
11
+
12
+ def process_line(line)
13
+ if @current_table
14
+ if end_stdin?(line)
15
+ clear_current_table
16
+ else
17
+ line = anonymize_line(line)
18
+ end
19
+ else
20
+ process_copy_line(line)
21
+ end
22
+ line
23
+ end
24
+
25
+ private
26
+
27
+ # This assumes the line is a tab delimited data line
28
+ def anonymize_line(line)
29
+ values = line.split("\t")
30
+ @positional_substitutions.each do |index, val_def|
31
+ anonymous_value = val_def.is_a?(Proc) ? val_def.call : val_def
32
+ values[index] = anonymous_value
33
+ end
34
+ values.join("\t")
35
+ end
36
+
37
+ def process_copy_line(line)
38
+ match_data = line.match(line_regex)
39
+ return unless match_data
40
+
41
+ table = match_data[:table_name].to_sym
42
+ fields = match_data[:field_defs]
43
+
44
+ @current_table = table
45
+ @positional_substitutions = find_positions(fields, @attribute_rules[table])
46
+ end
47
+
48
+ # Finds the positional range of the attribute to be replaced
49
+ # returns an array of arrays. The inner array is [<field_index>, <anonymous_value>]
50
+ def find_positions(fields_str, rules)
51
+ fields = fields_str.gsub('"', '').split(', ')
52
+
53
+ rules.map do |target_field, val|
54
+ index = fields.index(target_field.to_s)
55
+ [index, val] if index
56
+ end.compact
57
+ end
58
+
59
+ def line_regex
60
+ @line_regex ||= /^COPY public\.(?<table_name>#{table_names.join('|')}) \((?<field_defs>.*)\) FROM stdin;$/
61
+ end
62
+
63
+ # stdin is escaped with a line that is just '\.'
64
+ def end_stdin?(line)
65
+ line =~ /^\\.$/
66
+ end
67
+
68
+ def table_names
69
+ @attribute_rules.keys
70
+ end
71
+
72
+ def clear_current_table
73
+ @current_table = nil
74
+ @positional_substitutions = nil
75
+ end
76
+ end
77
+ end
@@ -0,0 +1,3 @@
1
+ module PgDumpAnonymize
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,29 @@
1
+ require_relative 'lib/pg_dump_anonymize/version'
2
+
3
+ Gem::Specification.new do |spec|
4
+ spec.name = "pg_dump_anonymize"
5
+ spec.version = PgDumpAnonymize::VERSION
6
+ spec.authors = ["Sean McCleary"]
7
+ spec.email = ["seanmcc@gmail.com"]
8
+
9
+ spec.summary = %q{A tool to take pg_dump SQL and anonymize defined tables and field}
10
+ spec.description = %q{Given the default pg_dump text SQL dump, this can take a simple definition of tables and fields to anonymize and efficiently anonymize the dump.}
11
+ spec.homepage = "https://github.com/mrinterweb/pg_dump_anonymize"
12
+ spec.license = "MIT"
13
+ spec.required_ruby_version = Gem::Requirement.new(">= 2.3.0")
14
+
15
+ # spec.metadata["allowed_push_host"] = "TODO: Set to 'http://mygemserver.com'"
16
+
17
+ spec.metadata["homepage_uri"] = spec.homepage
18
+ spec.metadata["source_code_uri"] = spec.homepage
19
+ # spec.metadata["changelog_uri"] = "TODO: Put your gem's CHANGELOG.md URL here."
20
+
21
+ # Specify which files should be added to the gem when it is released.
22
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
23
+ spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do
24
+ `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
25
+ end
26
+ spec.bindir = "exe"
27
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
28
+ spec.require_paths = ["lib"]
29
+ end
@@ -0,0 +1,9 @@
1
+ require 'faker'
2
+
3
+ {
4
+ users: {
5
+ first_name: -> { Faker::Name.first_name },
6
+ last_name: -> { Faker::Name.last_name },
7
+ email: -> { Faker::Internet.email }
8
+ }
9
+ }
metadata ADDED
@@ -0,0 +1,63 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: pg_dump_anonymize
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Sean McCleary
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2020-10-05 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: Given the default pg_dump text SQL dump, this can take a simple definition
14
+ of tables and fields to anonymize and efficiently anonymize the dump.
15
+ email:
16
+ - seanmcc@gmail.com
17
+ executables:
18
+ - pg_dump_anonymize
19
+ extensions: []
20
+ extra_rdoc_files: []
21
+ files:
22
+ - ".gitignore"
23
+ - ".rspec"
24
+ - ".travis.yml"
25
+ - Gemfile
26
+ - Gemfile.lock
27
+ - LICENSE.txt
28
+ - README.md
29
+ - Rakefile
30
+ - bin/console
31
+ - bin/setup
32
+ - exe/pg_dump_anonymize
33
+ - lib/pg_dump_anonymize.rb
34
+ - lib/pg_dump_anonymize/definition.rb
35
+ - lib/pg_dump_anonymize/version.rb
36
+ - pg_dump_anonymize.gemspec
37
+ - sample_definition.rb
38
+ homepage: https://github.com/mrinterweb/pg_dump_anonymize
39
+ licenses:
40
+ - MIT
41
+ metadata:
42
+ homepage_uri: https://github.com/mrinterweb/pg_dump_anonymize
43
+ source_code_uri: https://github.com/mrinterweb/pg_dump_anonymize
44
+ post_install_message:
45
+ rdoc_options: []
46
+ require_paths:
47
+ - lib
48
+ required_ruby_version: !ruby/object:Gem::Requirement
49
+ requirements:
50
+ - - ">="
51
+ - !ruby/object:Gem::Version
52
+ version: 2.3.0
53
+ required_rubygems_version: !ruby/object:Gem::Requirement
54
+ requirements:
55
+ - - ">="
56
+ - !ruby/object:Gem::Version
57
+ version: '0'
58
+ requirements: []
59
+ rubygems_version: 3.1.2
60
+ signing_key:
61
+ specification_version: 4
62
+ summary: A tool to take pg_dump SQL and anonymize defined tables and field
63
+ test_files: []