witsec 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/MIT-LICENSE +20 -0
- data/README.md +87 -0
- data/Rakefile +3 -0
- data/lib/tasks/witsec_tasks.rake +71 -0
- data/lib/witsec/alias.rb +29 -0
- data/lib/witsec/anonymizer.rb +93 -0
- data/lib/witsec/railtie.rb +9 -0
- data/lib/witsec/schema.rb +73 -0
- data/lib/witsec/version.rb +3 -0
- data/lib/witsec.rb +9 -0
- metadata +108 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 4ac7324ae9480a4bbabaca4d462bb4205a2fa470b1f872c7226f3a809c2c588f
|
4
|
+
data.tar.gz: 1d2fa60ac0531ad1f010f74658f7a6aeadddd0f343fc4712f9cc2f0caeb097d2
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: d580f3f361aaac94331a2935c7ed79da3e82106925f1c0012bc6b6e5319de1e77e1a28e295a8c47eb0ab42c8cec56a11f69b1363034397633fbdf37964750a56
|
7
|
+
data.tar.gz: c46f2392c59e03d4b81ab353c4e83b1c928eaf7974b21e3e6e55c70707fc33b3b9a5937d2990fd8bb47d2a69d3a3b5b481c777b2e5454472d8efe52de58f4e9b
|
data/MIT-LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright Nicolai Bach Woller
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,87 @@
|
|
1
|
+
# Witsec
|
2
|
+
When developing Rails applications you end up with a large difference between the size of the database used in development and the database in production. This makes it hard to gauge, how the performance is impacted as the amount of data grows.
|
3
|
+
|
4
|
+
You can try to generate a large set of data, but there is not guarantee, that the data you generate will produce the same performance issues as the real data would. Another approach is to download a database dump, but then you have real production data lying on your machine - including any sensitive data like SSNs or addresses.
|
5
|
+
|
6
|
+
This gem tries to avoid this by copying all data to a new database, anonymizing it all in the process. This new database can now be dumped and used in development.
|
7
|
+
|
8
|
+
## Installation
|
9
|
+
Add this line to your application's Gemfile:
|
10
|
+
|
11
|
+
```ruby
|
12
|
+
gem "witsec"
|
13
|
+
# if you want a simple way to generate fake data
|
14
|
+
gem "faker"
|
15
|
+
```
|
16
|
+
|
17
|
+
And then execute:
|
18
|
+
```bash
|
19
|
+
$ bundle
|
20
|
+
```
|
21
|
+
Then create a new file at `config/witsec/schema.rb`. This path will become configurable in a later version.
|
22
|
+
Finally configure your database to create a database to store the anonymized data in:
|
23
|
+
```yaml
|
24
|
+
production:
|
25
|
+
primary:
|
26
|
+
<<: *default
|
27
|
+
host: your_host_url
|
28
|
+
database: you_app_name
|
29
|
+
anonymized:
|
30
|
+
<<: *default
|
31
|
+
host: some_url_that_might_or_might_not_be_the_same_as_your_host_url
|
32
|
+
database: anonymized # This must be called anonymized for now. It will become configurable in a later version.
|
33
|
+
migrations_paths: db/migrate
|
34
|
+
```
|
35
|
+
|
36
|
+
## Usage
|
37
|
+
Witsec uses a schema file to determine what to anonymize and how to do it.
|
38
|
+
|
39
|
+
```ruby
|
40
|
+
# config/witsec/schema.rb
|
41
|
+
Witsec::Schema.define(2025_01_15_142512) do
|
42
|
+
anonymize_table "addresses" do |t|
|
43
|
+
t.column "street", using: -> { Faker::Address.street_address }
|
44
|
+
t.column "zip_code", using: -> { Faker::Address.zip_code }
|
45
|
+
t.column "city", using: "New York"
|
46
|
+
end
|
47
|
+
|
48
|
+
include_table "animals"
|
49
|
+
|
50
|
+
exclude_table "government_secrets"
|
51
|
+
end
|
52
|
+
```
|
53
|
+
`Witsec::Schema.define` requires an integer param. This should match the latest timestamp in your app's `db/schema.rb` and is used to ensure, that you have considered any changes introduced in database migrations. A warning is shown if a mismatch is detected, when you run the `bin/rails witsec:schema:verify` task. An error will be raised in a later version, when attempting to anonymize a database with a mismatch in versions.
|
54
|
+
|
55
|
+
There are three ways to anonymize a table:
|
56
|
+
|
57
|
+
#### `anonymize_table`
|
58
|
+
Takes the name of a table to be anonymized and a block, determining how each column should be masked. In the example above, [Faker](https://github.com/faker-ruby/faker) is used to provide a random address, but you can put whatever you want in the lambda or even provide a static value as is done on the city column.
|
59
|
+
|
60
|
+
Any column not mentioned in the block, will **not** be anonymized.
|
61
|
+
|
62
|
+
#### `include_table`
|
63
|
+
Takes the name of a table to be copied in its entirety without any masking at all. Use this for tables without **any** sensitive data.
|
64
|
+
|
65
|
+
#### `exclude_table`
|
66
|
+
Takes the name of a table to be excluded. No data will be copied. If any other tables reference anything in an excluded table, you are probably going to have a bad time.
|
67
|
+
|
68
|
+
### Rake tasks
|
69
|
+
Witsec comes with some tasks for anonymizing the database and verifying that the schema is up to date.
|
70
|
+
|
71
|
+
#### `witsec:anonymize`
|
72
|
+
Anonymizes the app's primary database using the configuration defined in your schema.
|
73
|
+
|
74
|
+
#### `witsec:scheme:verify_tables`
|
75
|
+
Checks that the tables in your database are all mentioned in your Witsec schema. Useful as a step in your CI.
|
76
|
+
|
77
|
+
#### `witsec:scheme:verify_version`
|
78
|
+
Checks that your Witsec::Schema version matches the version of your latest run migration. Useful as a step in your CI.
|
79
|
+
|
80
|
+
#### `witsec:scheme:verify`
|
81
|
+
Runs all other verifications
|
82
|
+
|
83
|
+
## Contributing
|
84
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/traels-it/witsec.
|
85
|
+
|
86
|
+
## License
|
87
|
+
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
data/Rakefile
ADDED
@@ -0,0 +1,71 @@
|
|
1
|
+
namespace :witsec do
|
2
|
+
desc "Anonymizes the app's primary database"
|
3
|
+
task anonymize: :environment do
|
4
|
+
Witsec::Anonymizer.new.anonymize
|
5
|
+
end
|
6
|
+
|
7
|
+
namespace :schema do
|
8
|
+
desc "Verify that the Witsec schema defines an anonymization for each table"
|
9
|
+
task :verify_tables, [:raise_error] => :environment do |_t, args|
|
10
|
+
args.with_defaults(raise_error: false)
|
11
|
+
|
12
|
+
anonymizer = Witsec::Anonymizer.new
|
13
|
+
|
14
|
+
if anonymizer.schema.table_names != ActiveRecord::Base.connection.tables.sort
|
15
|
+
missing_tables = ActiveRecord::Base.connection.tables - anonymizer.schema.table_names
|
16
|
+
extra_tables = anonymizer.schema.table_names - ActiveRecord::Base.connection.tables
|
17
|
+
|
18
|
+
messages = ["Witsec schema contains errors"]
|
19
|
+
|
20
|
+
messages << "Missing tables: #{missing_tables.join(", ")}" if missing_tables.any?
|
21
|
+
messages << "Extra tables: #{extra_tables.join(", ")}" if extra_tables.any?
|
22
|
+
|
23
|
+
return unless messages.any?
|
24
|
+
|
25
|
+
if args[:raise_error]
|
26
|
+
raise Witsec::TableMismatchError, messages.join("\n")
|
27
|
+
else
|
28
|
+
abort messages.join("\n")
|
29
|
+
end
|
30
|
+
end
|
31
|
+
end
|
32
|
+
|
33
|
+
desc "Verify the Witsec schema version matches the version of the latest run migration"
|
34
|
+
task :verify_version, [:raise_error] => :environment do |_t, args|
|
35
|
+
args.with_defaults(raise_error: false)
|
36
|
+
|
37
|
+
app_schema_version = ActiveRecord::Base.connection.schema_version
|
38
|
+
witsec_schema_version = Witsec::Anonymizer.new.schema.version
|
39
|
+
message = "Witsec schema version (#{witsec_schema_version}) does not match app's schema version (#{app_schema_version})"
|
40
|
+
|
41
|
+
if app_schema_version != witsec_schema_version
|
42
|
+
if args[:raise_error]
|
43
|
+
raise Witsec::VersionError, message
|
44
|
+
else
|
45
|
+
abort message
|
46
|
+
end
|
47
|
+
end
|
48
|
+
end
|
49
|
+
|
50
|
+
desc "Runs all Witsec verifications"
|
51
|
+
task verify: :environment do
|
52
|
+
error_messages = []
|
53
|
+
begin
|
54
|
+
Rake::Task["witsec:schema:verify_tables"].invoke(true)
|
55
|
+
rescue Witsec::TableMismatchError => error
|
56
|
+
error_messages << error.message
|
57
|
+
end
|
58
|
+
begin
|
59
|
+
Rake::Task["witsec:schema:verify_version"].invoke(true)
|
60
|
+
rescue Witsec::VersionError => error
|
61
|
+
error_messages << error.message
|
62
|
+
end
|
63
|
+
|
64
|
+
if error_messages.any?
|
65
|
+
abort error_messages.join("\n")
|
66
|
+
else
|
67
|
+
puts "Witsec schema is all good 🎉"
|
68
|
+
end
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
data/lib/witsec/alias.rb
ADDED
@@ -0,0 +1,29 @@
|
|
1
|
+
module Witsec
|
2
|
+
class Alias
|
3
|
+
def initialize(table_name, columns:, schema:)
|
4
|
+
@table_name = table_name
|
5
|
+
@columns = columns
|
6
|
+
@schema = schema
|
7
|
+
end
|
8
|
+
|
9
|
+
def anonymize(rows)
|
10
|
+
rows.map do |row|
|
11
|
+
table = schema.anonymized_tables.find { _1.name == table_name }
|
12
|
+
|
13
|
+
columns.each_with_index.map do |column, index|
|
14
|
+
anonymized_column, mask = table.columns.find { |name, _mask| name == column }
|
15
|
+
|
16
|
+
if anonymized_column.present?
|
17
|
+
mask.respond_to?(:call) ? mask.call : mask
|
18
|
+
else
|
19
|
+
row[index]
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
private
|
26
|
+
|
27
|
+
attr_reader :table_name, :columns, :schema
|
28
|
+
end
|
29
|
+
end
|
@@ -0,0 +1,93 @@
|
|
1
|
+
module Witsec
|
2
|
+
class Anonymizer
|
3
|
+
BATCH_SIZE = 1000
|
4
|
+
|
5
|
+
def initialize
|
6
|
+
@schema = instance_eval(File.read("config/witsec/schema.rb"))
|
7
|
+
end
|
8
|
+
|
9
|
+
attr_reader :schema
|
10
|
+
|
11
|
+
# TODO: Make silence configurable
|
12
|
+
def anonymize
|
13
|
+
time = Benchmark.measure do
|
14
|
+
ActiveRecord::Base.logger.silence do
|
15
|
+
clear_output_database
|
16
|
+
|
17
|
+
ActiveRecord::Base.connection.tables.each do |table_name|
|
18
|
+
if schema.anonymizes?(table_name)
|
19
|
+
# A performance improvement could probably be found here, if we just passed along included tables (as in tables, where no rows are anonymized) without querying etc.
|
20
|
+
|
21
|
+
input_connection = input_connection_pool.lease_connection
|
22
|
+
record_rows = input_connection.execute("SELECT * FROM #{table_name}").to_a
|
23
|
+
columns = record_rows&.first&.keys
|
24
|
+
rows = record_rows.map(&:values)
|
25
|
+
puts "Anonymizing #{table_name} (#{rows.size} rows)"
|
26
|
+
input_connection_pool.release_connection
|
27
|
+
|
28
|
+
anonymized_rows = Witsec::Alias.new(table_name, columns:, schema:).anonymize(rows)
|
29
|
+
output_connection = output_connection_pool.lease_connection
|
30
|
+
# If referential integrity is not disabled, you have to create all rows in the correct order
|
31
|
+
output_connection.disable_referential_integrity do
|
32
|
+
# Use insert for performance
|
33
|
+
row_batches = anonymized_rows.in_groups_of(BATCH_SIZE, false)
|
34
|
+
total = 0
|
35
|
+
row_batches.each_with_index do |batch, index|
|
36
|
+
print "Anonymizing up to row #{total + batch.size} of #{rows.size}\r"
|
37
|
+
total += batch.size
|
38
|
+
values = batch.map do |row|
|
39
|
+
"(#{row.map { |value| ActiveRecord::Base.connection.quote(value) }.join(", ")})"
|
40
|
+
end.join(", ")
|
41
|
+
|
42
|
+
output_connection.execute(
|
43
|
+
"INSERT INTO #{table_name} (#{columns.join(", ")}) VALUES #{values}"
|
44
|
+
)
|
45
|
+
end
|
46
|
+
end
|
47
|
+
|
48
|
+
output_connection_pool.release_connection
|
49
|
+
else
|
50
|
+
puts "Skipping #{table_name}"
|
51
|
+
next
|
52
|
+
end
|
53
|
+
end
|
54
|
+
end
|
55
|
+
end
|
56
|
+
puts "Anonymized all in #{time.real} seconds"
|
57
|
+
end
|
58
|
+
|
59
|
+
def clear_output_database
|
60
|
+
puts "Clearing output database"
|
61
|
+
|
62
|
+
ActiveRecord::Base.logger.silence do
|
63
|
+
connection = output_connection_pool.lease_connection
|
64
|
+
|
65
|
+
connection.disable_referential_integrity do
|
66
|
+
connection.tables.each do |table_name|
|
67
|
+
connection.execute("TRUNCATE TABLE #{table_name} CASCADE")
|
68
|
+
end
|
69
|
+
end
|
70
|
+
|
71
|
+
output_connection_pool.release_connection
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
private
|
76
|
+
|
77
|
+
def input_database_configuration
|
78
|
+
Rails.configuration.database_configuration[Rails.env]["primary"]
|
79
|
+
end
|
80
|
+
|
81
|
+
def input_connection_pool
|
82
|
+
ActiveRecord::Base.establish_connection(input_database_configuration)
|
83
|
+
end
|
84
|
+
|
85
|
+
def output_database_configuration
|
86
|
+
Rails.configuration.database_configuration[Rails.env]["anonymized"]
|
87
|
+
end
|
88
|
+
|
89
|
+
def output_connection_pool
|
90
|
+
ActiveRecord::Base.establish_connection(output_database_configuration)
|
91
|
+
end
|
92
|
+
end
|
93
|
+
end
|
@@ -0,0 +1,73 @@
|
|
1
|
+
module Witsec
|
2
|
+
class Schema
|
3
|
+
class << self
|
4
|
+
def define(version, &block)
|
5
|
+
new(version).define(&block)
|
6
|
+
end
|
7
|
+
end
|
8
|
+
|
9
|
+
def initialize(version)
|
10
|
+
@version = version
|
11
|
+
@anonymized_tables = []
|
12
|
+
@excluded_tables = []
|
13
|
+
end
|
14
|
+
|
15
|
+
attr_reader :version, :anonymized_tables, :excluded_tables
|
16
|
+
|
17
|
+
def define(&block)
|
18
|
+
instance_eval(&block)
|
19
|
+
|
20
|
+
self
|
21
|
+
end
|
22
|
+
|
23
|
+
def exclude_table(name)
|
24
|
+
excluded_tables << name
|
25
|
+
end
|
26
|
+
|
27
|
+
def include_table(name)
|
28
|
+
anonymized_tables << Table.new(name)
|
29
|
+
end
|
30
|
+
|
31
|
+
def anonymize_table(name, &block)
|
32
|
+
anonymized_tables << Table.new(name).define do
|
33
|
+
instance_eval(&block)
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
def anonymizes?(table_name)
|
38
|
+
anonymized_table_names.include?(table_name)
|
39
|
+
end
|
40
|
+
|
41
|
+
def table_names
|
42
|
+
(anonymized_table_names + excluded_tables).sort
|
43
|
+
end
|
44
|
+
|
45
|
+
private
|
46
|
+
|
47
|
+
def anonymized_table_names
|
48
|
+
anonymized_tables.map(&:name)
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
class Table
|
53
|
+
def initialize(name)
|
54
|
+
@name = name
|
55
|
+
@columns = []
|
56
|
+
end
|
57
|
+
|
58
|
+
attr_reader :name, :columns
|
59
|
+
|
60
|
+
def define(&block)
|
61
|
+
instance_eval(&block)
|
62
|
+
|
63
|
+
self
|
64
|
+
end
|
65
|
+
|
66
|
+
def column(column_name, using: nil)
|
67
|
+
columns << [column_name, using]
|
68
|
+
end
|
69
|
+
end
|
70
|
+
|
71
|
+
TableMismatchError = Class.new(StandardError)
|
72
|
+
VersionError = Class.new(StandardError)
|
73
|
+
end
|
data/lib/witsec.rb
ADDED
metadata
ADDED
@@ -0,0 +1,108 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: witsec
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Nicolai Bach Woller
|
8
|
+
bindir: bin
|
9
|
+
cert_chain: []
|
10
|
+
date: 2025-02-24 00:00:00.000000000 Z
|
11
|
+
dependencies:
|
12
|
+
- !ruby/object:Gem::Dependency
|
13
|
+
name: rails
|
14
|
+
requirement: !ruby/object:Gem::Requirement
|
15
|
+
requirements:
|
16
|
+
- - "~>"
|
17
|
+
- !ruby/object:Gem::Version
|
18
|
+
version: '8.0'
|
19
|
+
type: :runtime
|
20
|
+
prerelease: false
|
21
|
+
version_requirements: !ruby/object:Gem::Requirement
|
22
|
+
requirements:
|
23
|
+
- - "~>"
|
24
|
+
- !ruby/object:Gem::Version
|
25
|
+
version: '8.0'
|
26
|
+
- !ruby/object:Gem::Dependency
|
27
|
+
name: minitest-spec-rails
|
28
|
+
requirement: !ruby/object:Gem::Requirement
|
29
|
+
requirements:
|
30
|
+
- - ">="
|
31
|
+
- !ruby/object:Gem::Version
|
32
|
+
version: '0'
|
33
|
+
type: :development
|
34
|
+
prerelease: false
|
35
|
+
version_requirements: !ruby/object:Gem::Requirement
|
36
|
+
requirements:
|
37
|
+
- - ">="
|
38
|
+
- !ruby/object:Gem::Version
|
39
|
+
version: '0'
|
40
|
+
- !ruby/object:Gem::Dependency
|
41
|
+
name: faker
|
42
|
+
requirement: !ruby/object:Gem::Requirement
|
43
|
+
requirements:
|
44
|
+
- - ">="
|
45
|
+
- !ruby/object:Gem::Version
|
46
|
+
version: '0'
|
47
|
+
type: :development
|
48
|
+
prerelease: false
|
49
|
+
version_requirements: !ruby/object:Gem::Requirement
|
50
|
+
requirements:
|
51
|
+
- - ">="
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: '0'
|
54
|
+
- !ruby/object:Gem::Dependency
|
55
|
+
name: standard
|
56
|
+
requirement: !ruby/object:Gem::Requirement
|
57
|
+
requirements:
|
58
|
+
- - "~>"
|
59
|
+
- !ruby/object:Gem::Version
|
60
|
+
version: '1.44'
|
61
|
+
type: :development
|
62
|
+
prerelease: false
|
63
|
+
version_requirements: !ruby/object:Gem::Requirement
|
64
|
+
requirements:
|
65
|
+
- - "~>"
|
66
|
+
- !ruby/object:Gem::Version
|
67
|
+
version: '1.44'
|
68
|
+
email:
|
69
|
+
- woller@traels.it
|
70
|
+
executables: []
|
71
|
+
extensions: []
|
72
|
+
extra_rdoc_files: []
|
73
|
+
files:
|
74
|
+
- MIT-LICENSE
|
75
|
+
- README.md
|
76
|
+
- Rakefile
|
77
|
+
- lib/tasks/witsec_tasks.rake
|
78
|
+
- lib/witsec.rb
|
79
|
+
- lib/witsec/alias.rb
|
80
|
+
- lib/witsec/anonymizer.rb
|
81
|
+
- lib/witsec/railtie.rb
|
82
|
+
- lib/witsec/schema.rb
|
83
|
+
- lib/witsec/version.rb
|
84
|
+
homepage: https://github.com/traels-it/witsec
|
85
|
+
licenses:
|
86
|
+
- MIT
|
87
|
+
metadata:
|
88
|
+
homepage_uri: https://github.com/traels-it/witsec
|
89
|
+
source_code_uri: https://github.com/traels-it/witsec
|
90
|
+
changelog_uri: https://github.com/traels-it/witsec/blob/main/CHANGELOG.md
|
91
|
+
rdoc_options: []
|
92
|
+
require_paths:
|
93
|
+
- lib
|
94
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
95
|
+
requirements:
|
96
|
+
- - ">="
|
97
|
+
- !ruby/object:Gem::Version
|
98
|
+
version: '0'
|
99
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
100
|
+
requirements:
|
101
|
+
- - ">="
|
102
|
+
- !ruby/object:Gem::Version
|
103
|
+
version: '0'
|
104
|
+
requirements: []
|
105
|
+
rubygems_version: 3.6.2
|
106
|
+
specification_version: 4
|
107
|
+
summary: Anonymize your database for dumping
|
108
|
+
test_files: []
|