witsec 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 4ac7324ae9480a4bbabaca4d462bb4205a2fa470b1f872c7226f3a809c2c588f
4
+ data.tar.gz: 1d2fa60ac0531ad1f010f74658f7a6aeadddd0f343fc4712f9cc2f0caeb097d2
5
+ SHA512:
6
+ metadata.gz: d580f3f361aaac94331a2935c7ed79da3e82106925f1c0012bc6b6e5319de1e77e1a28e295a8c47eb0ab42c8cec56a11f69b1363034397633fbdf37964750a56
7
+ data.tar.gz: c46f2392c59e03d4b81ab353c4e83b1c928eaf7974b21e3e6e55c70707fc33b3b9a5937d2990fd8bb47d2a69d3a3b5b481c777b2e5454472d8efe52de58f4e9b
data/MIT-LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright Nicolai Bach Woller
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,87 @@
1
+ # Witsec
2
+ When developing Rails applications you end up with a large difference between the size of the database used in development and the database in production. This makes it hard to gauge, how the performance is impacted as the amount of data grows.
3
+
4
+ You can try to generate a large set of data, but there is not guarantee, that the data you generate will produce the same performance issues as the real data would. Another approach is to download a database dump, but then you have real production data lying on your machine - including any sensitive data like SSNs or addresses.
5
+
6
+ This gem tries to avoid this by copying all data to a new database, anonymizing it all in the process. This new database can now be dumped and used in development.
7
+
8
+ ## Installation
9
+ Add this line to your application's Gemfile:
10
+
11
+ ```ruby
12
+ gem "witsec"
13
+ # if you want a simple way to generate fake data
14
+ gem "faker"
15
+ ```
16
+
17
+ And then execute:
18
+ ```bash
19
+ $ bundle
20
+ ```
21
+ Then create a new file at `config/witsec/schema.rb`. This path will become configurable in a later version.
22
+ Finally configure your database to create a database to store the anonymized data in:
23
+ ```yaml
24
+ production:
25
+ primary:
26
+ <<: *default
27
+ host: your_host_url
28
+ database: you_app_name
29
+ anonymized:
30
+ <<: *default
31
+ host: some_url_that_might_or_might_not_be_the_same_as_your_host_url
32
+ database: anonymized # This must be called anonymized for now. It will become configurable in a later version.
33
+ migrations_paths: db/migrate
34
+ ```
35
+
36
+ ## Usage
37
+ Witsec uses a schema file to determine what to anonymize and how to do it.
38
+
39
+ ```ruby
40
+ # config/witsec/schema.rb
41
+ Witsec::Schema.define(2025_01_15_142512) do
42
+ anonymize_table "addresses" do |t|
43
+ t.column "street", using: -> { Faker::Address.street_address }
44
+ t.column "zip_code", using: -> { Faker::Address.zip_code }
45
+ t.column "city", using: "New York"
46
+ end
47
+
48
+ include_table "animals"
49
+
50
+ exclude_table "government_secrets"
51
+ end
52
+ ```
53
+ `Witsec::Schema.define` requires an integer param. This should match the latest timestamp in your app's `db/schema.rb` and is used to ensure, that you have considered any changes introduced in database migrations. A warning is shown if a mismatch is detected, when you run the `bin/rails witsec:schema:verify` task. An error will be raised in a later version, when attempting to anonymize a database with a mismatch in versions.
54
+
55
+ There are three ways to anonymize a table:
56
+
57
+ #### `anonymize_table`
58
+ Takes the name of a table to be anonymized and a block, determining how each column should be masked. In the example above, [Faker](https://github.com/faker-ruby/faker) is used to provide a random address, but you can put whatever you want in the lambda or even provide a static value as is done on the city column.
59
+
60
+ Any column not mentioned in the block, will **not** be anonymized.
61
+
62
+ #### `include_table`
63
+ Takes the name of a table to be copied in its entirety without any masking at all. Use this for tables without **any** sensitive data.
64
+
65
+ #### `exclude_table`
66
+ Takes the name of a table to be excluded. No data will be copied. If any other tables reference anything in an excluded table, you are probably going to have a bad time.
67
+
68
+ ### Rake tasks
69
+ Witsec comes with some tasks for anonymizing the database and verifying that the schema is up to date.
70
+
71
+ #### `witsec:anonymize`
72
+ Anonymizes the app's primary database using the configuration defined in your schema.
73
+
74
+ #### `witsec:scheme:verify_tables`
75
+ Checks that the tables in your database are all mentioned in your Witsec schema. Useful as a step in your CI.
76
+
77
+ #### `witsec:scheme:verify_version`
78
+ Checks that your Witsec::Schema version matches the version of your latest run migration. Useful as a step in your CI.
79
+
80
+ #### `witsec:scheme:verify`
81
+ Runs all other verifications
82
+
83
+ ## Contributing
84
+ Bug reports and pull requests are welcome on GitHub at https://github.com/traels-it/witsec.
85
+
86
+ ## License
87
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
data/Rakefile ADDED
@@ -0,0 +1,3 @@
1
+ require "bundler/setup"
2
+
3
+ require "bundler/gem_tasks"
@@ -0,0 +1,71 @@
1
+ namespace :witsec do
2
+ desc "Anonymizes the app's primary database"
3
+ task anonymize: :environment do
4
+ Witsec::Anonymizer.new.anonymize
5
+ end
6
+
7
+ namespace :schema do
8
+ desc "Verify that the Witsec schema defines an anonymization for each table"
9
+ task :verify_tables, [:raise_error] => :environment do |_t, args|
10
+ args.with_defaults(raise_error: false)
11
+
12
+ anonymizer = Witsec::Anonymizer.new
13
+
14
+ if anonymizer.schema.table_names != ActiveRecord::Base.connection.tables.sort
15
+ missing_tables = ActiveRecord::Base.connection.tables - anonymizer.schema.table_names
16
+ extra_tables = anonymizer.schema.table_names - ActiveRecord::Base.connection.tables
17
+
18
+ messages = ["Witsec schema contains errors"]
19
+
20
+ messages << "Missing tables: #{missing_tables.join(", ")}" if missing_tables.any?
21
+ messages << "Extra tables: #{extra_tables.join(", ")}" if extra_tables.any?
22
+
23
+ return unless messages.any?
24
+
25
+ if args[:raise_error]
26
+ raise Witsec::TableMismatchError, messages.join("\n")
27
+ else
28
+ abort messages.join("\n")
29
+ end
30
+ end
31
+ end
32
+
33
+ desc "Verify the Witsec schema version matches the version of the latest run migration"
34
+ task :verify_version, [:raise_error] => :environment do |_t, args|
35
+ args.with_defaults(raise_error: false)
36
+
37
+ app_schema_version = ActiveRecord::Base.connection.schema_version
38
+ witsec_schema_version = Witsec::Anonymizer.new.schema.version
39
+ message = "Witsec schema version (#{witsec_schema_version}) does not match app's schema version (#{app_schema_version})"
40
+
41
+ if app_schema_version != witsec_schema_version
42
+ if args[:raise_error]
43
+ raise Witsec::VersionError, message
44
+ else
45
+ abort message
46
+ end
47
+ end
48
+ end
49
+
50
+ desc "Runs all Witsec verifications"
51
+ task verify: :environment do
52
+ error_messages = []
53
+ begin
54
+ Rake::Task["witsec:schema:verify_tables"].invoke(true)
55
+ rescue Witsec::TableMismatchError => error
56
+ error_messages << error.message
57
+ end
58
+ begin
59
+ Rake::Task["witsec:schema:verify_version"].invoke(true)
60
+ rescue Witsec::VersionError => error
61
+ error_messages << error.message
62
+ end
63
+
64
+ if error_messages.any?
65
+ abort error_messages.join("\n")
66
+ else
67
+ puts "Witsec schema is all good 🎉"
68
+ end
69
+ end
70
+ end
71
+ end
@@ -0,0 +1,29 @@
1
+ module Witsec
2
+ class Alias
3
+ def initialize(table_name, columns:, schema:)
4
+ @table_name = table_name
5
+ @columns = columns
6
+ @schema = schema
7
+ end
8
+
9
+ def anonymize(rows)
10
+ rows.map do |row|
11
+ table = schema.anonymized_tables.find { _1.name == table_name }
12
+
13
+ columns.each_with_index.map do |column, index|
14
+ anonymized_column, mask = table.columns.find { |name, _mask| name == column }
15
+
16
+ if anonymized_column.present?
17
+ mask.respond_to?(:call) ? mask.call : mask
18
+ else
19
+ row[index]
20
+ end
21
+ end
22
+ end
23
+ end
24
+
25
+ private
26
+
27
+ attr_reader :table_name, :columns, :schema
28
+ end
29
+ end
@@ -0,0 +1,93 @@
1
+ module Witsec
2
+ class Anonymizer
3
+ BATCH_SIZE = 1000
4
+
5
+ def initialize
6
+ @schema = instance_eval(File.read("config/witsec/schema.rb"))
7
+ end
8
+
9
+ attr_reader :schema
10
+
11
+ # TODO: Make silence configurable
12
+ def anonymize
13
+ time = Benchmark.measure do
14
+ ActiveRecord::Base.logger.silence do
15
+ clear_output_database
16
+
17
+ ActiveRecord::Base.connection.tables.each do |table_name|
18
+ if schema.anonymizes?(table_name)
19
+ # A performance improvement could probably be found here, if we just passed along included tables (as in tables, where no rows are anonymized) without querying etc.
20
+
21
+ input_connection = input_connection_pool.lease_connection
22
+ record_rows = input_connection.execute("SELECT * FROM #{table_name}").to_a
23
+ columns = record_rows&.first&.keys
24
+ rows = record_rows.map(&:values)
25
+ puts "Anonymizing #{table_name} (#{rows.size} rows)"
26
+ input_connection_pool.release_connection
27
+
28
+ anonymized_rows = Witsec::Alias.new(table_name, columns:, schema:).anonymize(rows)
29
+ output_connection = output_connection_pool.lease_connection
30
+ # If referential integrity is not disabled, you have to create all rows in the correct order
31
+ output_connection.disable_referential_integrity do
32
+ # Use insert for performance
33
+ row_batches = anonymized_rows.in_groups_of(BATCH_SIZE, false)
34
+ total = 0
35
+ row_batches.each_with_index do |batch, index|
36
+ print "Anonymizing up to row #{total + batch.size} of #{rows.size}\r"
37
+ total += batch.size
38
+ values = batch.map do |row|
39
+ "(#{row.map { |value| ActiveRecord::Base.connection.quote(value) }.join(", ")})"
40
+ end.join(", ")
41
+
42
+ output_connection.execute(
43
+ "INSERT INTO #{table_name} (#{columns.join(", ")}) VALUES #{values}"
44
+ )
45
+ end
46
+ end
47
+
48
+ output_connection_pool.release_connection
49
+ else
50
+ puts "Skipping #{table_name}"
51
+ next
52
+ end
53
+ end
54
+ end
55
+ end
56
+ puts "Anonymized all in #{time.real} seconds"
57
+ end
58
+
59
+ def clear_output_database
60
+ puts "Clearing output database"
61
+
62
+ ActiveRecord::Base.logger.silence do
63
+ connection = output_connection_pool.lease_connection
64
+
65
+ connection.disable_referential_integrity do
66
+ connection.tables.each do |table_name|
67
+ connection.execute("TRUNCATE TABLE #{table_name} CASCADE")
68
+ end
69
+ end
70
+
71
+ output_connection_pool.release_connection
72
+ end
73
+ end
74
+
75
+ private
76
+
77
+ def input_database_configuration
78
+ Rails.configuration.database_configuration[Rails.env]["primary"]
79
+ end
80
+
81
+ def input_connection_pool
82
+ ActiveRecord::Base.establish_connection(input_database_configuration)
83
+ end
84
+
85
+ def output_database_configuration
86
+ Rails.configuration.database_configuration[Rails.env]["anonymized"]
87
+ end
88
+
89
+ def output_connection_pool
90
+ ActiveRecord::Base.establish_connection(output_database_configuration)
91
+ end
92
+ end
93
+ end
@@ -0,0 +1,9 @@
1
+ module Witsec
2
+ class Railtie < ::Rails::Railtie
3
+ rake_tasks do
4
+ Dir.glob(File.join(File.dirname(__FILE__), "../tasks/**/*.rake")).each do |rake_file|
5
+ load rake_file
6
+ end
7
+ end
8
+ end
9
+ end
@@ -0,0 +1,73 @@
1
+ module Witsec
2
+ class Schema
3
+ class << self
4
+ def define(version, &block)
5
+ new(version).define(&block)
6
+ end
7
+ end
8
+
9
+ def initialize(version)
10
+ @version = version
11
+ @anonymized_tables = []
12
+ @excluded_tables = []
13
+ end
14
+
15
+ attr_reader :version, :anonymized_tables, :excluded_tables
16
+
17
+ def define(&block)
18
+ instance_eval(&block)
19
+
20
+ self
21
+ end
22
+
23
+ def exclude_table(name)
24
+ excluded_tables << name
25
+ end
26
+
27
+ def include_table(name)
28
+ anonymized_tables << Table.new(name)
29
+ end
30
+
31
+ def anonymize_table(name, &block)
32
+ anonymized_tables << Table.new(name).define do
33
+ instance_eval(&block)
34
+ end
35
+ end
36
+
37
+ def anonymizes?(table_name)
38
+ anonymized_table_names.include?(table_name)
39
+ end
40
+
41
+ def table_names
42
+ (anonymized_table_names + excluded_tables).sort
43
+ end
44
+
45
+ private
46
+
47
+ def anonymized_table_names
48
+ anonymized_tables.map(&:name)
49
+ end
50
+ end
51
+
52
+ class Table
53
+ def initialize(name)
54
+ @name = name
55
+ @columns = []
56
+ end
57
+
58
+ attr_reader :name, :columns
59
+
60
+ def define(&block)
61
+ instance_eval(&block)
62
+
63
+ self
64
+ end
65
+
66
+ def column(column_name, using: nil)
67
+ columns << [column_name, using]
68
+ end
69
+ end
70
+
71
+ TableMismatchError = Class.new(StandardError)
72
+ VersionError = Class.new(StandardError)
73
+ end
@@ -0,0 +1,3 @@
1
+ module Witsec
2
+ VERSION = "0.1.0"
3
+ end
data/lib/witsec.rb ADDED
@@ -0,0 +1,9 @@
1
+ require "witsec/version"
2
+ require "witsec/railtie"
3
+ require "witsec/alias"
4
+ require "witsec/anonymizer"
5
+ require "witsec/schema"
6
+
7
+ module Witsec
8
+ # Your code goes here...
9
+ end
metadata ADDED
@@ -0,0 +1,108 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: witsec
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Nicolai Bach Woller
8
+ bindir: bin
9
+ cert_chain: []
10
+ date: 2025-02-24 00:00:00.000000000 Z
11
+ dependencies:
12
+ - !ruby/object:Gem::Dependency
13
+ name: rails
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - "~>"
17
+ - !ruby/object:Gem::Version
18
+ version: '8.0'
19
+ type: :runtime
20
+ prerelease: false
21
+ version_requirements: !ruby/object:Gem::Requirement
22
+ requirements:
23
+ - - "~>"
24
+ - !ruby/object:Gem::Version
25
+ version: '8.0'
26
+ - !ruby/object:Gem::Dependency
27
+ name: minitest-spec-rails
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - ">="
31
+ - !ruby/object:Gem::Version
32
+ version: '0'
33
+ type: :development
34
+ prerelease: false
35
+ version_requirements: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - ">="
38
+ - !ruby/object:Gem::Version
39
+ version: '0'
40
+ - !ruby/object:Gem::Dependency
41
+ name: faker
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
47
+ type: :development
48
+ prerelease: false
49
+ version_requirements: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ - !ruby/object:Gem::Dependency
55
+ name: standard
56
+ requirement: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - "~>"
59
+ - !ruby/object:Gem::Version
60
+ version: '1.44'
61
+ type: :development
62
+ prerelease: false
63
+ version_requirements: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - "~>"
66
+ - !ruby/object:Gem::Version
67
+ version: '1.44'
68
+ email:
69
+ - woller@traels.it
70
+ executables: []
71
+ extensions: []
72
+ extra_rdoc_files: []
73
+ files:
74
+ - MIT-LICENSE
75
+ - README.md
76
+ - Rakefile
77
+ - lib/tasks/witsec_tasks.rake
78
+ - lib/witsec.rb
79
+ - lib/witsec/alias.rb
80
+ - lib/witsec/anonymizer.rb
81
+ - lib/witsec/railtie.rb
82
+ - lib/witsec/schema.rb
83
+ - lib/witsec/version.rb
84
+ homepage: https://github.com/traels-it/witsec
85
+ licenses:
86
+ - MIT
87
+ metadata:
88
+ homepage_uri: https://github.com/traels-it/witsec
89
+ source_code_uri: https://github.com/traels-it/witsec
90
+ changelog_uri: https://github.com/traels-it/witsec/blob/main/CHANGELOG.md
91
+ rdoc_options: []
92
+ require_paths:
93
+ - lib
94
+ required_ruby_version: !ruby/object:Gem::Requirement
95
+ requirements:
96
+ - - ">="
97
+ - !ruby/object:Gem::Version
98
+ version: '0'
99
+ required_rubygems_version: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ requirements: []
105
+ rubygems_version: 3.6.2
106
+ specification_version: 4
107
+ summary: Anonymize your database for dumping
108
+ test_files: []