RubyGems - pg_dump_anonymize - Versions diffs - 0.1.0 → 0.1.1 - Mend

pg_dump_anonymize 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml +4 -4
data/Gemfile.lock +1 -1
data/README.md +20 -3
data/lib/pg_dump_anonymize/definition.rb +6 -2
data/lib/pg_dump_anonymize/version.rb +1 -1
metadata +2 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 903be54f2c5727d77dc205f98565863caa7977639c6302a47d9a52f7cdcac15e
-  data.tar.gz: 244fe8be50f46e717bddf6222f46360b2cff2be016285f101a0f9698a10bed6e
+  metadata.gz: 56f41e435e9420140a7969c4d536a6a517836ca5d8d06e5be8bfca127fc939f0
+  data.tar.gz: 016d7a8b99e9f67046e7192856a13eb9bcf5ab018ce7cd61a00ed5b3375783b0
 SHA512:
-  metadata.gz: 1d4a8ef4dc0050fd1fd942e3c4925bfe0a1cbf5145b6b643fc1188e0479fcfb5262e04d90cfae9715ac6f0f3060757ae4da21eb0ea7c72cb6785f7b39290e183
-  data.tar.gz: 0c8f53ddcf4d46c5ea49426059419848aa97205579e99ab7a44eed5f6143bf603c994ccd52a58f3928b2171204e18e33648b95b0a9234e96dae9bebb02eaae78
+  metadata.gz: a69b3a188689de3d2636a28ee45e7b7902f5ebea3292f675ff7283c834ff67b6c3550fb3b979bfee4487197eb20c5d71773b92b255023476663824b7aede7ceb
+  data.tar.gz: e04e32850bb4ad0ea99e426e5e6fb222773ea570cac6b305527fe567b6bc80e6db81158a5bcc6ce9b05c49d5b224bf8f87fa4ba2624cf5425f34e277145450a4

data/Gemfile.lock CHANGED

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    pg_dump_anonymize (0.1.0)
+    pg_dump_anonymize (0.1.1)
 GEM
   remote: https://rubygems.org/

data/README.md CHANGED

@@ -1,7 +1,9 @@
-# PgDumpAnonymize
+# pg_dump_anonymize
 Anonymizing pg_dump data isn't always straight forward. This tool helps. It has no dependencies, other than ruby. Another good thing is that sensitive data doesn't need to ever be stored temporarily as this can be used with Unix pipes and the data is passed through as IO data.
+`pg_dump_anonymize` does not anonymize any data automatically. It is very much BYOAD (Bring Your Own Anonymizing Definition). Inside your anonymizing definition, you can use any ruby gems you like, such as [faker](https://github.com/faker-ruby/faker). This gem makes applying anonymizing definitions to `pg_dump` output easy.
 ## Installation
 Add this line to your application's Gemfile:
@@ -20,15 +22,19 @@ Or install it yourself as:
 ## Usage
+`pg_dump_anonymize` does not anonymize any data by default. You must provide your own anonymizing definition. The gem currently requires the format of output `pg_dump` is plain text (not compressed) and uses the default `COPY` behavior instead of `INSERT INTO`.
 Example usage:
 ```
 pg_dump --no-privileges --no-owner <database-name> | pg_dump_anonymize -d sample_definition.rb > anonymized_dump.sql
 ```
+You can also pipe the anonymized SQL directly into `psql` to avoid intermediate SQL dump files.
 ### Definition File
-The definition file is any ruby code you'd like, but it should return a hash with table names as the top level keys and attribute names nested under the table keys. Any faking ruby gems dependencies, and other gem dependencies you may choose to use in your definitions, are gems that will need to be available on the server this is ran on.
+The definition file is any ruby code you'd like, but it must return a hash with table names as the top level keys and attribute names nested under the table keys. Any faking ruby gems dependencies, and other gem dependencies you may choose to use in your definitions, are gems that will need to be available on the server this is ran on.
 Example:
@@ -37,6 +43,14 @@ Example:
   table_1: {
     sensitive_field_1: 'some string value',
     sensitive_field_2: -> { rand(100) },
+    sensitive_field_3: -> (original_value, row_context) do
+      if original_value
+        'xxxxx'
+      end.tap { |new_val| row_context[field_3_val] = new_val }
+    end
+    sensitive_field_4: -> (_original_value, row_context) do
+      row_context[:field_3_val] ? 'foo' : 'bar'
+    end
   },
   table_2: {
     sensitive_field_3: nil
@@ -54,7 +68,7 @@ require 'faker'
     first_name: -> { Faker::Name.first_name },
     last_name: -> { Faker::Name.last_name },
     email: -> { Faker::Internet.email },
-    city: -> 'Portland'
+    city: 'Portland'
   },
   accounts: {
     bank_name: -> { Faker::Bank.name },
@@ -64,6 +78,9 @@ require 'faker'
 }
 ```
+## Todo
+- [ ] Write some tests (so far this has been tested manually)
 ## Development
 After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.

data/lib/pg_dump_anonymize/definition.rb CHANGED

@@ -27,9 +27,13 @@ module PgDumpAnonymize
     # This assumes the line is a tab delimited data line
     def anonymize_line(line)
       values = line.split("\t")
+      row_context = {} # used to share state for a row
       @positional_substitutions.each do |index, val_def|
-        anonymous_value = val_def.is_a?(Proc) ? val_def.call : val_def
-        values[index] = anonymous_value
+        values[index] = if val_def.is_a?(Proc)
+                          val_def.call(*[values[index], row_context].slice(0, val_def.arity))
+                        else
+                          val_def
+                        end
       end
       values.join("\t")
     end

data/lib/pg_dump_anonymize/version.rb CHANGED

@@ -1,3 +1,3 @@
 module PgDumpAnonymize
-  VERSION = "0.1.0"
+  VERSION = "0.1.1"
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: pg_dump_anonymize
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.1.1
 platform: ruby
 authors:
 - Sean McCleary
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2020-10-05 00:00:00.000000000 Z
+date: 2020-12-30 00:00:00.000000000 Z
 dependencies: []
 description: Given the default pg_dump text SQL dump, this can take a simple definition
   of tables and fields to anonymize and efficiently anonymize the dump.