data-anonymization 0.2.0 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +67 -43
- data/data-anonymization.gemspec +1 -0
- data/lib/core/database.rb +21 -3
- data/lib/core/dsl.rb +3 -1
- data/lib/data-anonymization.rb +2 -0
- data/lib/parallel/table.rb +13 -0
- data/lib/strategy/base.rb +9 -5
- data/lib/strategy/field/default_anon.rb +3 -2
- data/lib/strategy/field/fields.rb +2 -0
- data/lib/strategy/field/number/random_big_decimal_delta.rb +19 -0
- data/lib/strategy/field/number/random_float_delta.rb +1 -6
- data/lib/strategy/field/string/random_url.rb +30 -0
- data/lib/strategy/field/string/select_from_database.rb +2 -1
- data/lib/utils/database.rb +1 -0
- data/lib/utils/progress_bar.rb +30 -3
- data/lib/utils/random_string_chars_only.rb +14 -0
- data/lib/version.rb +1 -1
- data/spec/acceptance/rdbms_blacklist_spec.rb +1 -0
- data/spec/acceptance/rdbms_whitelist_spec.rb +7 -2
- data/spec/spec_helper.rb +1 -1
- data/spec/strategy/field/datetime/date_delta_spec.rb +0 -1
- data/spec/strategy/field/default_anon_spec.rb +7 -0
- data/spec/strategy/field/number/random_big_decimal_delta_spec.rb +20 -0
- data/spec/strategy/field/number/random_float_delta_spec.rb +4 -6
- data/spec/strategy/field/string/random_url_spec.rb +15 -0
- data/spec/strategy/field/string/select_from_database_spec.rb +2 -6
- data/spec/support/customer_sample.rb +7 -1
- metadata +26 -2
data/README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
Tool to create anonymized production data dump to use for PERF and other TEST environments.
|
3
3
|
|
4
4
|
## Getting started
|
5
|
-
Install gem using
|
5
|
+
Install gem using:
|
6
6
|
|
7
7
|
$ gem install data-anonymization
|
8
8
|
|
@@ -41,23 +41,59 @@ Run using:
|
|
41
41
|
|
42
42
|
## Examples
|
43
43
|
|
44
|
-
1. [Whitelist](https://github.com/sunitparekh/data-anonymization/blob/master/whitelist_dsl.rb)
|
45
|
-
2. [Blacklist](https://github.com/sunitparekh/data-anonymization/blob/master/blacklist_dsl.rb)
|
44
|
+
1. [Whitelist using Chinoook sample database](https://github.com/sunitparekh/data-anonymization/blob/master/whitelist_dsl.rb)
|
45
|
+
2. [Blacklist using Chinoook sample database](https://github.com/sunitparekh/data-anonymization/blob/master/blacklist_dsl.rb)
|
46
|
+
3. [Whitelist with composite primary key using DellStore sample database](https://github.com/sunitparekh/test-anonymization/blob/master/dell_whitelist.rb)
|
47
|
+
4. [Blacklist with composite primary key using DellStore sample database](https://github.com/sunitparekh/test-anonymization/blob/master/dell_blacklist.rb)
|
48
|
+
|
49
|
+
## Changelog
|
50
|
+
|
51
|
+
#### 0.3.0 (Sep 4, 2012)
|
52
|
+
|
53
|
+
Major changes:
|
54
|
+
|
55
|
+
1. Added support for Parallel table execution
|
56
|
+
2. Change in default String strategy from LoremIpsum to RandomString based on end user feedback.
|
57
|
+
3. Fixed issue with table column name 'type' as this is default name for STI in activerecord.
|
58
|
+
|
59
|
+
Please see the [Github 0.3.0 milestone page](https://github.com/sunitparekh/data-anonymization/issues?milestone=1&page=1&state=open) for more details on changes/fixes in release 0.3.0
|
60
|
+
|
61
|
+
#### 0.2.0 (August 16, 2012)
|
62
|
+
|
63
|
+
1. Added the progress bar using 'powerbar' gem. Which also shows the ETA for each table.
|
64
|
+
2. Added More strategies
|
65
|
+
3. Fixed default anonymization strategies for boolean and integer values
|
66
|
+
4. Added support for composite primary key
|
67
|
+
|
68
|
+
#### 0.1.2 (August 14, 2012)
|
69
|
+
|
70
|
+
1. First initial release
|
71
|
+
|
72
|
+
## Roadmap
|
73
|
+
|
74
|
+
#### 0.4.0
|
75
|
+
|
76
|
+
1. MongoDB anonymization support (NoSQL document based database support)
|
77
|
+
|
78
|
+
#### 0.5.0
|
79
|
+
|
80
|
+
1. Generate DSL from database and build schema from source as part of Whitelist approach.
|
46
81
|
|
47
82
|
#### Share feedback
|
48
83
|
Please use Github [issues](https://github.com/sunitparekh/data-anonymization/issues) to share feedback, feature suggestions and report issues.
|
49
84
|
|
50
85
|
## What is data anonymization?
|
51
86
|
|
52
|
-
For almost all projects there is a need
|
53
|
-
However, getting production data and using it is not feasible due to multiple reasons,
|
87
|
+
For almost all projects there is a need for production data dump in order to run performance tests, rehearse production releases and debug production issues.
|
88
|
+
However, getting production data and using it is not feasible due to multiple reasons, primary being privacy concerns for user data. And thus the need for data anonymization.
|
54
89
|
This tool helps you to get anonymized production data dump using either Blacklist or Whitelist strategies.
|
55
90
|
|
56
91
|
## Anonymization Strategies
|
57
92
|
|
58
93
|
### Blacklist
|
59
94
|
This approach essentially leaves all fields unchanged with the exception of those specified by the user, which are scrambled/anonymized (hence the name blacklist).
|
60
|
-
Blacklist create a copy of prod database and chooses the fields to be anonymized like e.g. username, password, email, name, geo location etc. based on user specification Most of the fields
|
95
|
+
For `Blacklist` create a copy of prod database and chooses the fields to be anonymized like e.g. username, password, email, name, geo location etc. based on user specification. Most of the fields have different rules e.g. password should be set to same value for all users, email needs to be valid.
|
96
|
+
|
61
97
|
The problem with this approach is that when new fields are added they will not be anonymized by default. Human error in omitting users personal data could be damaging.
|
62
98
|
|
63
99
|
```ruby
|
@@ -70,9 +106,9 @@ end
|
|
70
106
|
|
71
107
|
### Whitelist
|
72
108
|
This approach, by default scrambles/anonymizes all fields except a list of fields which are allowed to copied as is. Hence the name whitelist.
|
73
|
-
By default all data needs to be anonymized. So from production database
|
74
|
-
All fields would be anonymized using default anonymization
|
75
|
-
A
|
109
|
+
By default all data needs to be anonymized. So from production database data is sanitized record by record and inserted as anonymized data into destination database. Source database needs to be readonly.
|
110
|
+
All fields would be anonymized using default anonymization strategy which is based on the datatype, unless a special anonymization strategy is specified. For instance special strategies could be used for emails, passwords, usernames etc.
|
111
|
+
A whitelisted field implies that it's okay to copy the data as is and anonymization isn't required.
|
76
112
|
This way any new field will be anonymized by default and if we need them as is, add it to the whitelist explicitly. This prevents any human error and protects sensitive information.
|
77
113
|
|
78
114
|
```ruby
|
@@ -84,6 +120,25 @@ database 'DatabaseName' do
|
|
84
120
|
end
|
85
121
|
```
|
86
122
|
|
123
|
+
## Tips
|
124
|
+
|
125
|
+
1. In Whitelist approach make source database connection READONLY.
|
126
|
+
2. Change [default field strategies](#default-field-strategies) to avoid using same strategy again and again in your DSL.
|
127
|
+
3. To run anonymization in parallel at Table level, provided no FK constraint on tables use DataAnon::Parallel::Table strategy
|
128
|
+
|
129
|
+
## Running in Parallel
|
130
|
+
Currently provides capability of running anonymization in parallel at table level provided no FK constraints on tables.
|
131
|
+
It uses [Parallel gem](https://github.com/grosser/parallel) provided by Michael Grosser.
|
132
|
+
By default it starts multiple parallel ruby processes processing table one by one.
|
133
|
+
```ruby
|
134
|
+
database 'DellStore' do
|
135
|
+
strategy DataAnon::Strategy::Whitelist
|
136
|
+
execution_strategy DataAnon::Parallel::Table # by default sequential table processing
|
137
|
+
...
|
138
|
+
end
|
139
|
+
```
|
140
|
+
|
141
|
+
|
87
142
|
## DataAnon::Core::Field
|
88
143
|
The object that gets passed along with the field strategies.
|
89
144
|
|
@@ -386,7 +441,6 @@ write your own anonymous field strategies within DSL,
|
|
386
441
|
end
|
387
442
|
```
|
388
443
|
|
389
|
-
|
390
444
|
## Default field strategies
|
391
445
|
|
392
446
|
```ruby
|
@@ -428,38 +482,6 @@ ENV['show_progress'] = 'false'
|
|
428
482
|
DataAnon::Utils::Logging.logger.level = Logger::INFO
|
429
483
|
```
|
430
484
|
|
431
|
-
## Changelog
|
432
|
-
|
433
|
-
#### 0.2.0
|
434
|
-
|
435
|
-
1. Added the progress bar using 'powerbar' gem. Which also shows the ETA for each table.
|
436
|
-
2. Added More strategies
|
437
|
-
3. Fixed default anonymization strategies for boolean and integer values
|
438
|
-
4. Added support for composite primary key
|
439
|
-
|
440
|
-
#### 0.1.2 (August 14, 2012)
|
441
|
-
|
442
|
-
1. First initial release
|
443
|
-
|
444
|
-
## Roadmap
|
445
|
-
|
446
|
-
#### 0.2.0
|
447
|
-
|
448
|
-
1. Complete list of all the field strategies planned supporting all data types
|
449
|
-
|
450
|
-
#### 0.3.0
|
451
|
-
|
452
|
-
1. Run anonymization in parallel threads (performance enchantments)
|
453
|
-
|
454
|
-
#### 0.4.0
|
455
|
-
|
456
|
-
1. MongoDB anonymization support (NoSQL document based database support)
|
457
|
-
|
458
|
-
#### 0.5.0
|
459
|
-
|
460
|
-
1. Generate DSL from database and build schema from source as part of Whitelist approach.
|
461
|
-
|
462
|
-
|
463
485
|
## Want to contribute?
|
464
486
|
|
465
487
|
1. Fork it
|
@@ -476,6 +498,8 @@ DataAnon::Utils::Logging.logger.level = Logger::INFO
|
|
476
498
|
|
477
499
|
- [ThoughtWorks Inc](http://www.thoughtworks.com), for allowing us to build this tool and make it open source.
|
478
500
|
- [Birinder](https://twitter.com/birinder_) and [Panda](https://twitter.com/sarbashrestha) for reviewing the documentation.
|
479
|
-
|
501
|
+
- [Dan Abel](http://www.linkedin.com/pub/dan-abel/0/61b/9b0) for introducing me to Blacklist and Whitelist approach for data anonymization.
|
502
|
+
- [Chirga Doshi](https://twitter.com/chiragsdoshi) for encouraging me to get this done.
|
503
|
+
- [Aditya Karle](https://twitter.com/adityakarle) for the Logo. (Coming Soon...)
|
480
504
|
|
481
505
|
|
data/data-anonymization.gemspec
CHANGED
data/lib/core/database.rb
CHANGED
@@ -7,18 +7,26 @@ module DataAnon
|
|
7
7
|
@name = name
|
8
8
|
@strategy = DataAnon::Strategy::Whitelist
|
9
9
|
@user_defaults = {}
|
10
|
+
@tables = []
|
11
|
+
@execution_strategy = DataAnon::Core::Sequential
|
12
|
+
ENV['parallel_execution'] = 'false'
|
10
13
|
end
|
11
14
|
|
12
15
|
def strategy strategy
|
13
16
|
@strategy = strategy
|
14
17
|
end
|
15
18
|
|
19
|
+
def execution_strategy execution_strategy
|
20
|
+
@execution_strategy = execution_strategy
|
21
|
+
ENV['parallel_execution'] = 'true' if execution_strategy == DataAnon::Parallel::Table
|
22
|
+
end
|
23
|
+
|
16
24
|
def source_db connection_spec
|
17
|
-
|
25
|
+
@source_database = connection_spec
|
18
26
|
end
|
19
27
|
|
20
28
|
def destination_db connection_spec
|
21
|
-
|
29
|
+
@destination_database = connection_spec
|
22
30
|
end
|
23
31
|
|
24
32
|
def default_field_strategies default_strategies
|
@@ -26,10 +34,20 @@ module DataAnon
|
|
26
34
|
end
|
27
35
|
|
28
36
|
def table (name, &block)
|
29
|
-
@strategy.new(name, @user_defaults).process_fields(&block)
|
37
|
+
table = @strategy.new(@source_database, @destination_database, name, @user_defaults).process_fields(&block)
|
38
|
+
@tables<< table
|
39
|
+
end
|
40
|
+
|
41
|
+
def anonymize
|
42
|
+
@execution_strategy.new.anonymize @tables
|
30
43
|
end
|
31
44
|
|
45
|
+
end
|
32
46
|
|
47
|
+
class Sequential
|
48
|
+
def anonymize tables
|
49
|
+
tables.each { |table| table.process }
|
50
|
+
end
|
33
51
|
end
|
34
52
|
|
35
53
|
end
|
data/lib/core/dsl.rb
CHANGED
@@ -5,7 +5,9 @@ module DataAnon
|
|
5
5
|
|
6
6
|
def database(name, &block)
|
7
7
|
logger.debug "Processing Database: #{name}"
|
8
|
-
DataAnon::Core::Database.new(name)
|
8
|
+
database = DataAnon::Core::Database.new(name)
|
9
|
+
database.instance_eval &block
|
10
|
+
database.anonymize
|
9
11
|
end
|
10
12
|
|
11
13
|
end
|
data/lib/data-anonymization.rb
CHANGED
@@ -4,9 +4,11 @@ require "utils/logging"
|
|
4
4
|
require "utils/random_int"
|
5
5
|
require "utils/random_float"
|
6
6
|
require "utils/random_string"
|
7
|
+
require "utils/random_string_chars_only"
|
7
8
|
require "utils/geojson_parser"
|
8
9
|
require "utils/progress_bar"
|
9
10
|
require "utils/resource"
|
11
|
+
require "parallel/table"
|
10
12
|
require "core/database"
|
11
13
|
require "core/field"
|
12
14
|
require "strategy/strategies"
|
data/lib/strategy/base.rb
CHANGED
@@ -1,14 +1,14 @@
|
|
1
|
-
require 'powerbar'
|
2
|
-
|
3
1
|
module DataAnon
|
4
2
|
module Strategy
|
5
3
|
class Base
|
6
4
|
include Utils::Logging
|
7
5
|
|
8
|
-
def initialize name, user_strategies
|
6
|
+
def initialize source_database, destination_database, name, user_strategies
|
9
7
|
@name = name
|
10
8
|
@user_strategies = user_strategies
|
11
9
|
@fields = {}
|
10
|
+
@source_database = source_database
|
11
|
+
@destination_database = destination_database
|
12
12
|
end
|
13
13
|
|
14
14
|
def process_fields &block
|
@@ -50,11 +50,15 @@ module DataAnon
|
|
50
50
|
end
|
51
51
|
|
52
52
|
def dest_table
|
53
|
-
@dest_table
|
53
|
+
return @dest_table unless @dest_table.nil?
|
54
|
+
DataAnon::Utils::DestinationDatabase.establish_connection @destination_database if @destination_database
|
55
|
+
@dest_table = Utils::DestinationTable.create @name, @primary_keys
|
54
56
|
end
|
55
57
|
|
56
58
|
def source_table
|
57
|
-
@source_table
|
59
|
+
return @source_table unless @source_table.nil?
|
60
|
+
DataAnon::Utils::SourceDatabase.establish_connection @source_database
|
61
|
+
@source_table = Utils::SourceTable.create @name, @primary_keys
|
58
62
|
end
|
59
63
|
|
60
64
|
def process
|
@@ -4,10 +4,11 @@ module DataAnon
|
|
4
4
|
|
5
5
|
class DefaultAnon
|
6
6
|
|
7
|
-
DEFAULT_STRATEGIES = {:string => FieldStrategy::
|
7
|
+
DEFAULT_STRATEGIES = {:string => FieldStrategy::RandomString.new,
|
8
8
|
:fixnum => FieldStrategy::RandomIntegerDelta.new(5),
|
9
9
|
:bignum => FieldStrategy::RandomIntegerDelta.new(5000),
|
10
10
|
:float => FieldStrategy::RandomFloatDelta.new(5.0),
|
11
|
+
:bigdecimal => FieldStrategy::RandomBigDecimalDelta.new(500.0),
|
11
12
|
:datetime => FieldStrategy::DateTimeDelta.new,
|
12
13
|
:time => FieldStrategy::TimeDelta.new,
|
13
14
|
:date => FieldStrategy::DateDelta.new,
|
@@ -21,7 +22,7 @@ module DataAnon
|
|
21
22
|
|
22
23
|
def anonymize field
|
23
24
|
strategy = @user_defaults[field.value.class.to_s.downcase.to_sym]
|
24
|
-
raise "No strategy defined for datatype #{field.value.class}" unless strategy
|
25
|
+
raise "No strategy defined for datatype #{field.value.class}. Use 'default_field_strategies' option in your script. Refer to http://sunitparekh.github.com/data-anonymization/#default-field-strategies for more details. " unless strategy
|
25
26
|
strategy.anonymize field
|
26
27
|
end
|
27
28
|
|
@@ -7,6 +7,7 @@ require 'strategy/field/anonymous'
|
|
7
7
|
require 'strategy/field/string/lorem_ipsum'
|
8
8
|
require 'strategy/field/string/string_template'
|
9
9
|
require 'strategy/field/string/random_string'
|
10
|
+
require 'strategy/field/string/random_url'
|
10
11
|
require 'strategy/field/string/formatted_string_numbers'
|
11
12
|
|
12
13
|
require 'strategy/field/string/select_from_file'
|
@@ -18,6 +19,7 @@ require 'strategy/field/number/random_integer'
|
|
18
19
|
require 'strategy/field/number/random_float'
|
19
20
|
require 'strategy/field/number/random_integer_delta'
|
20
21
|
require 'strategy/field/number/random_float_delta'
|
22
|
+
require 'strategy/field/number/random_big_decimal_delta'
|
21
23
|
|
22
24
|
# contact
|
23
25
|
require 'strategy/field/contact/geojson_base'
|
@@ -0,0 +1,19 @@
|
|
1
|
+
require 'bigdecimal'
|
2
|
+
|
3
|
+
module DataAnon
|
4
|
+
module Strategy
|
5
|
+
module Field
|
6
|
+
class RandomBigDecimalDelta
|
7
|
+
|
8
|
+
def initialize delta = 100.0
|
9
|
+
@delta = delta
|
10
|
+
end
|
11
|
+
|
12
|
+
def anonymize field
|
13
|
+
return BigDecimal.new("#{field.value + DataAnon::Utils::RandomFloat.generate(-@delta, +@delta)}")
|
14
|
+
end
|
15
|
+
|
16
|
+
end
|
17
|
+
end
|
18
|
+
end
|
19
|
+
end
|
@@ -8,14 +8,9 @@ module DataAnon
|
|
8
8
|
end
|
9
9
|
|
10
10
|
def anonymize field
|
11
|
-
|
11
|
+
return field.value + DataAnon::Utils::RandomFloat.generate(-@delta, +@delta)
|
12
12
|
end
|
13
13
|
|
14
|
-
def range (min, max)
|
15
|
-
Random.new.rand * (max-min) + min
|
16
|
-
end
|
17
|
-
|
18
|
-
|
19
14
|
end
|
20
15
|
end
|
21
16
|
end
|
@@ -0,0 +1,30 @@
|
|
1
|
+
module DataAnon
|
2
|
+
module Strategy
|
3
|
+
module Field
|
4
|
+
class RandomUrl
|
5
|
+
|
6
|
+
def anonymize field
|
7
|
+
|
8
|
+
url = field.value
|
9
|
+
randomized_url = ""
|
10
|
+
protocols = url.scan(/http:\/\/|www\./)
|
11
|
+
protocols.each do |token|
|
12
|
+
url = url.gsub(token,"")
|
13
|
+
randomized_url += token
|
14
|
+
end
|
15
|
+
|
16
|
+
marker_position = 0
|
17
|
+
|
18
|
+
while marker_position < url.length
|
19
|
+
special_char_index = url.index(/\W/, marker_position) || url.length
|
20
|
+
text = url[marker_position...special_char_index]
|
21
|
+
randomized_url += "#{DataAnon::Utils::RandomStringCharsOnly.generate(text.length)}#{url[special_char_index]}"
|
22
|
+
marker_position = special_char_index + 1
|
23
|
+
end
|
24
|
+
|
25
|
+
randomized_url
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|
30
|
+
end
|
@@ -5,7 +5,8 @@ module DataAnon
|
|
5
5
|
class SelectFromDatabase
|
6
6
|
include Utils::Logging
|
7
7
|
|
8
|
-
def initialize table_name, field_name
|
8
|
+
def initialize table_name, field_name, connection_spec
|
9
|
+
DataAnon::Utils::SourceDatabase.establish_connection connection_spec
|
9
10
|
source = Utils::SourceTable.create table_name, []
|
10
11
|
@values = source.select(field_name).uniq.collect { |record| record[field_name]}
|
11
12
|
logger.debug "For field strategy #{table_name}:#{field_name} using values #{@values} "
|
data/lib/utils/database.rb
CHANGED
@@ -29,6 +29,7 @@ module DataAnon
|
|
29
29
|
self.table_name = table_name
|
30
30
|
self.primary_keys = primary_keys if primary_keys.length > 1
|
31
31
|
self.primary_key = primary_keys[0] if primary_keys.length == 1
|
32
|
+
self.inheritance_column = :_type_disabled
|
32
33
|
self.mass_assignment_sanitizer = MassAssignmentIgnoreSanitizer.new(self)
|
33
34
|
end
|
34
35
|
end
|
data/lib/utils/progress_bar.rb
CHANGED
@@ -1,24 +1,51 @@
|
|
1
|
+
require 'powerbar'
|
2
|
+
|
1
3
|
module DataAnon
|
2
4
|
module Utils
|
3
5
|
|
4
6
|
class ProgressBar
|
7
|
+
include Utils::Logging
|
5
8
|
|
6
9
|
def initialize table_name, total
|
7
10
|
@total = total
|
8
11
|
@table_name = table_name
|
9
|
-
@progress_bar = PowerBar.new if total > 0 && show_progress
|
12
|
+
@progress_bar = PowerBar.new if total > 0 && show_progress && !parallel?
|
10
13
|
end
|
11
14
|
|
12
15
|
def show_progress
|
13
16
|
ENV['show_progress'] != 'false'
|
14
17
|
end
|
15
18
|
|
19
|
+
def parallel?
|
20
|
+
ENV['parallel_execution'] == 'true'
|
21
|
+
end
|
22
|
+
|
16
23
|
def show index
|
17
|
-
if
|
18
|
-
@progress_bar
|
24
|
+
if started(index) || regular_interval(index) || complete(index)
|
25
|
+
if @progress_bar
|
26
|
+
msg = "Table: %-15s [ %6d/%-6d ]" % [ @table_name,index,@total]
|
27
|
+
@progress_bar.show(:msg => msg, :done => index, :total => @total)
|
28
|
+
elsif parallel?
|
29
|
+
suffix = ""
|
30
|
+
suffix = "STARTED" if started(index)
|
31
|
+
suffix = "COMPLETE" if complete(index)
|
32
|
+
logger.info("Table: %-15s [ %6d/%-6d ] %s" % [ @table_name,index,@total, suffix])
|
33
|
+
end
|
19
34
|
end
|
20
35
|
end
|
21
36
|
|
37
|
+
def complete index
|
38
|
+
index == @total
|
39
|
+
end
|
40
|
+
|
41
|
+
def regular_interval index
|
42
|
+
index % 1000 == 0
|
43
|
+
end
|
44
|
+
|
45
|
+
def started index
|
46
|
+
index == 1
|
47
|
+
end
|
48
|
+
|
22
49
|
def close
|
23
50
|
@progress_bar.close if @progress_bar
|
24
51
|
end
|
@@ -0,0 +1,14 @@
|
|
1
|
+
module DataAnon
|
2
|
+
module Utils
|
3
|
+
class RandomStringCharsOnly
|
4
|
+
|
5
|
+
def self.generate length = nil
|
6
|
+
length ||= Random.new.rand 5...15
|
7
|
+
chars = 'abcdefghjkmnpqrstuvwxyz'
|
8
|
+
random_string = ''
|
9
|
+
length.times { random_string << chars[rand(chars.size)] }
|
10
|
+
random_string
|
11
|
+
end
|
12
|
+
end
|
13
|
+
end
|
14
|
+
end
|
data/lib/version.rb
CHANGED
@@ -18,6 +18,7 @@ describe "End 2 End RDBMS Blacklist Acceptance Test using SQLite database" do
|
|
18
18
|
table 'customers' do
|
19
19
|
primary_key 'cust_id'
|
20
20
|
anonymize('email').using FieldStrategy::StringTemplate.new('test+#{row_number}@gmail.com')
|
21
|
+
anonymize 'terms_n_condition', 'age'
|
21
22
|
end
|
22
23
|
end
|
23
24
|
|
@@ -22,12 +22,14 @@ describe "End 2 End RDBMS Whitelist Acceptance Test using SQLite database" do
|
|
22
22
|
|
23
23
|
table 'customers' do
|
24
24
|
primary_key 'cust_id'
|
25
|
-
whitelist 'cust_id', 'address', 'zipcode'
|
25
|
+
whitelist 'cust_id', 'address', 'zipcode', 'blog_url'
|
26
26
|
anonymize('first_name').using FieldStrategy::RandomFirstName.new
|
27
27
|
anonymize('last_name').using FieldStrategy::RandomLastName.new
|
28
28
|
anonymize('state').using FieldStrategy::SelectFromList.new(['Gujrat','Karnataka'])
|
29
29
|
anonymize('phone').using FieldStrategy::RandomPhoneNumber.new
|
30
30
|
anonymize('email').using FieldStrategy::StringTemplate.new('test+#{row_number}@gmail.com')
|
31
|
+
anonymize 'terms_n_condition', 'age', 'longitude'
|
32
|
+
anonymize('latitude').using FieldStrategy::RandomFloatDelta.new(2.0)
|
31
33
|
end
|
32
34
|
end
|
33
35
|
|
@@ -42,7 +44,10 @@ describe "End 2 End RDBMS Whitelist Acceptance Test using SQLite database" do
|
|
42
44
|
new_rec.zipcode.should == '411048'
|
43
45
|
new_rec.phone.should_not be "9923700662"
|
44
46
|
new_rec.email.should == 'test+1@gmail.com'
|
45
|
-
|
47
|
+
[true,false].should include(new_rec.terms_n_condition)
|
48
|
+
new_rec.age.should be_between(0,100)
|
49
|
+
new_rec.latitude.should be_between( 38.689060, 42.689060)
|
50
|
+
new_rec.longitude.should be_between( -84.044636, -64.044636)
|
46
51
|
|
47
52
|
end
|
48
53
|
end
|
data/spec/spec_helper.rb
CHANGED
@@ -7,7 +7,7 @@ ENV['show_progress'] = 'false'
|
|
7
7
|
|
8
8
|
Dir["#{File.dirname(__FILE__)}/support/**/*.rb"].each {|f| require f}
|
9
9
|
|
10
|
-
DataAnon::Utils::Logging.logger.level = Logger::
|
10
|
+
DataAnon::Utils::Logging.logger.level = Logger::WARN
|
11
11
|
|
12
12
|
RSpec.configure do |config|
|
13
13
|
config.expect_with :rspec
|
@@ -11,7 +11,6 @@ describe FieldStrategy::DateDelta do
|
|
11
11
|
let(:date_difference) {anonymized_value - field.value}
|
12
12
|
|
13
13
|
it { anonymized_value.should be_kind_of Date}
|
14
|
-
it { anonymized_value.should_not == Date.new(2011,4,7) }
|
15
14
|
it { date_difference.should be_between(-5.days, 5.days) }
|
16
15
|
end
|
17
16
|
|
@@ -32,6 +32,13 @@ describe FieldStrategy::DefaultAnon do
|
|
32
32
|
it { anonymized_value.should be_kind_of Fixnum }
|
33
33
|
end
|
34
34
|
|
35
|
+
describe 'anonymized bignum value' do
|
36
|
+
let(:field) {DataAnon::Core::Field.new('int_field',2348723489723847382947,1,nil)}
|
37
|
+
let(:anonymized_value) {DefaultAnon.new.anonymize(field)}
|
38
|
+
|
39
|
+
it { anonymized_value.should be_kind_of Bignum }
|
40
|
+
end
|
41
|
+
|
35
42
|
describe 'anonymized string value' do
|
36
43
|
let(:field) {DataAnon::Core::Field.new('string_field','String',1,nil)}
|
37
44
|
let(:anonymized_value) {DefaultAnon.new.anonymize(field)}
|
@@ -0,0 +1,20 @@
|
|
1
|
+
require "spec_helper"
|
2
|
+
require 'bigdecimal'
|
3
|
+
|
4
|
+
describe FieldStrategy::RandomBigDecimalDelta do
|
5
|
+
|
6
|
+
RandomBigDecimalDelta = FieldStrategy::RandomBigDecimalDelta
|
7
|
+
let(:field) {DataAnon::Core::Field.new('decimal_field',BigDecimal.new("53422342378687687342893.23324"),1,nil)}
|
8
|
+
|
9
|
+
describe 'anonymized big decimal should not be the same as original value' do
|
10
|
+
let(:anonymized_value) {RandomBigDecimalDelta.new.anonymize(field)}
|
11
|
+
|
12
|
+
it {anonymized_value.should_not equal field.value}
|
13
|
+
end
|
14
|
+
|
15
|
+
describe 'anonymized value returned should be big decimal' do
|
16
|
+
let(:anonymized_value) {RandomBigDecimalDelta.new.anonymize(field)}
|
17
|
+
|
18
|
+
it { anonymized_value.should be_kind_of BigDecimal }
|
19
|
+
end
|
20
|
+
end
|
@@ -6,16 +6,14 @@ describe FieldStrategy::RandomFloatDelta do
|
|
6
6
|
let(:field) {DataAnon::Core::Field.new('float_field',5.5,1,nil)}
|
7
7
|
|
8
8
|
describe 'anonymized float should not be the same as original value' do
|
9
|
-
let(:
|
9
|
+
let(:anonymized_value) {RandomFloatDelta.new(5).anonymize(field)}
|
10
10
|
|
11
|
-
it {
|
11
|
+
it {anonymized_value.should_not equal field.value}
|
12
12
|
end
|
13
13
|
|
14
14
|
describe 'anonymized value returned should be a float' do
|
15
|
-
let(:
|
15
|
+
let(:anonymized_value) {RandomFloatDelta.new(5).anonymize(field)}
|
16
16
|
|
17
|
-
it {
|
18
|
-
is_float.should be true
|
19
|
-
}
|
17
|
+
it { anonymized_value.should be_kind_of Float }
|
20
18
|
end
|
21
19
|
end
|
@@ -0,0 +1,15 @@
|
|
1
|
+
require "spec_helper"
|
2
|
+
|
3
|
+
describe FieldStrategy::RandomUrl do
|
4
|
+
|
5
|
+
RandomUrl = FieldStrategy::RandomUrl
|
6
|
+
|
7
|
+
describe 'anonymized url must not be the same as original url' do
|
8
|
+
let(:field) {DataAnon::Core::Field.new('string_field','http://fakeurl.com',1,nil)}
|
9
|
+
let(:anonymized_url) {RandomUrl.new.anonymize(field)}
|
10
|
+
|
11
|
+
it {anonymized_url.should_not equal field.value}
|
12
|
+
it {anonymized_url.should match /https?:\/\/[\S]+/}
|
13
|
+
end
|
14
|
+
|
15
|
+
end
|
@@ -2,17 +2,13 @@ require "spec_helper"
|
|
2
2
|
|
3
3
|
describe FieldStrategy::SelectFromDatabase do
|
4
4
|
|
5
|
-
before(:each) do
|
6
|
-
source = {:adapter => 'sqlite3', :database => 'sample-data/chinook.sqlite'}
|
7
|
-
DataAnon::Utils::SourceDatabase.establish_connection source
|
8
|
-
end
|
9
|
-
|
10
5
|
SelectFromDatabase = FieldStrategy::SelectFromDatabase
|
11
6
|
let(:field) { DataAnon::Core::Field.new('name', 'Abcd', 1, nil) }
|
7
|
+
let(:source) { {:adapter => 'sqlite3', :database => 'sample-data/chinook.sqlite'} }
|
12
8
|
|
13
9
|
describe 'more than one values in predefined list' do
|
14
10
|
|
15
|
-
let(:anonymized_value) { SelectFromDatabase.new('MediaType','Name').anonymize(field) }
|
11
|
+
let(:anonymized_value) { SelectFromDatabase.new('MediaType','Name', source).anonymize(field) }
|
16
12
|
|
17
13
|
it { anonymized_value.should_not be('Abcd') }
|
18
14
|
it { anonymized_value.should_not be_empty }
|
@@ -12,6 +12,11 @@ class CustomerSample
|
|
12
12
|
t.string :zipcode
|
13
13
|
t.string :phone
|
14
14
|
t.string :email
|
15
|
+
t.string :blog_url
|
16
|
+
t.boolean :terms_n_condition
|
17
|
+
t.integer :age
|
18
|
+
t.float :latitude
|
19
|
+
t.float :longitude
|
15
20
|
end
|
16
21
|
end
|
17
22
|
end
|
@@ -30,7 +35,8 @@ class CustomerSample
|
|
30
35
|
SAMPLE_DATA = {:cust_id => 100, :first_name => "Sunit", :last_name => "Parekh",
|
31
36
|
:birth_date => Date.new(1977,7,8), :address => "F 501 Shanti Nagar",
|
32
37
|
:state => "Maharastra", :zipcode => "411048", :phone => "9923700662",
|
33
|
-
:email => "parekh.sunit@gmail.com"
|
38
|
+
:email => "parekh.sunit@gmail.com", :terms_n_condition => true,
|
39
|
+
:age => 34, :longitude => -74.044636, :latitude => +40.689060}
|
34
40
|
|
35
41
|
def self.insert_record connection_spec, data_hash = SAMPLE_DATA
|
36
42
|
DataAnon::Utils::TempDatabase.establish_connection connection_spec
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: data-anonymization
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -11,7 +11,7 @@ authors:
|
|
11
11
|
autorequire:
|
12
12
|
bindir: bin
|
13
13
|
cert_chain: []
|
14
|
-
date: 2012-
|
14
|
+
date: 2012-09-04 00:00:00.000000000 Z
|
15
15
|
dependencies:
|
16
16
|
- !ruby/object:Gem::Dependency
|
17
17
|
name: activerecord
|
@@ -109,6 +109,22 @@ dependencies:
|
|
109
109
|
- - ~>
|
110
110
|
- !ruby/object:Gem::Version
|
111
111
|
version: 1.0.8
|
112
|
+
- !ruby/object:Gem::Dependency
|
113
|
+
name: parallel
|
114
|
+
requirement: !ruby/object:Gem::Requirement
|
115
|
+
none: false
|
116
|
+
requirements:
|
117
|
+
- - ~>
|
118
|
+
- !ruby/object:Gem::Version
|
119
|
+
version: 0.5.18
|
120
|
+
type: :runtime
|
121
|
+
prerelease: false
|
122
|
+
version_requirements: !ruby/object:Gem::Requirement
|
123
|
+
none: false
|
124
|
+
requirements:
|
125
|
+
- - ~>
|
126
|
+
- !ruby/object:Gem::Version
|
127
|
+
version: 0.5.18
|
112
128
|
description: Data anonymization tool for RDBMS databases
|
113
129
|
email:
|
114
130
|
- parekh.sunit@gmail.com
|
@@ -134,6 +150,7 @@ files:
|
|
134
150
|
- lib/core/dsl.rb
|
135
151
|
- lib/core/field.rb
|
136
152
|
- lib/data-anonymization.rb
|
153
|
+
- lib/parallel/table.rb
|
137
154
|
- lib/strategy/base.rb
|
138
155
|
- lib/strategy/blacklist.rb
|
139
156
|
- lib/strategy/field/anonymous.rb
|
@@ -158,6 +175,7 @@ files:
|
|
158
175
|
- lib/strategy/field/name/random_full_name.rb
|
159
176
|
- lib/strategy/field/name/random_last_name.rb
|
160
177
|
- lib/strategy/field/name/random_user_name.rb
|
178
|
+
- lib/strategy/field/number/random_big_decimal_delta.rb
|
161
179
|
- lib/strategy/field/number/random_float.rb
|
162
180
|
- lib/strategy/field/number/random_float_delta.rb
|
163
181
|
- lib/strategy/field/number/random_integer.rb
|
@@ -166,6 +184,7 @@ files:
|
|
166
184
|
- lib/strategy/field/string/formatted_string_numbers.rb
|
167
185
|
- lib/strategy/field/string/lorem_ipsum.rb
|
168
186
|
- lib/strategy/field/string/random_string.rb
|
187
|
+
- lib/strategy/field/string/random_url.rb
|
169
188
|
- lib/strategy/field/string/select_from_database.rb
|
170
189
|
- lib/strategy/field/string/select_from_file.rb
|
171
190
|
- lib/strategy/field/string/select_from_list.rb
|
@@ -181,6 +200,7 @@ files:
|
|
181
200
|
- lib/utils/random_float.rb
|
182
201
|
- lib/utils/random_int.rb
|
183
202
|
- lib/utils/random_string.rb
|
203
|
+
- lib/utils/random_string_chars_only.rb
|
184
204
|
- lib/utils/resource.rb
|
185
205
|
- lib/version.rb
|
186
206
|
- resources/UK_addresses.geojson
|
@@ -210,6 +230,7 @@ files:
|
|
210
230
|
- spec/strategy/field/name/random_full_name_spec.rb
|
211
231
|
- spec/strategy/field/name/random_last_name_spec.rb
|
212
232
|
- spec/strategy/field/name/random_user_name_spec.rb
|
233
|
+
- spec/strategy/field/number/random_big_decimal_delta_spec.rb
|
213
234
|
- spec/strategy/field/number/random_float_delta_spec.rb
|
214
235
|
- spec/strategy/field/number/random_float_spec.rb
|
215
236
|
- spec/strategy/field/number/random_integer_delta_spec.rb
|
@@ -218,6 +239,7 @@ files:
|
|
218
239
|
- spec/strategy/field/string/formatted_string_numbers_spec.rb
|
219
240
|
- spec/strategy/field/string/lorem_ipsum_spec.rb
|
220
241
|
- spec/strategy/field/string/random_string_spec.rb
|
242
|
+
- spec/strategy/field/string/random_url_spec.rb
|
221
243
|
- spec/strategy/field/string/select_from_database_spec.rb
|
222
244
|
- spec/strategy/field/string/select_from_list_spec.rb
|
223
245
|
- spec/strategy/field/string/string_template_spec.rb
|
@@ -277,6 +299,7 @@ test_files:
|
|
277
299
|
- spec/strategy/field/name/random_full_name_spec.rb
|
278
300
|
- spec/strategy/field/name/random_last_name_spec.rb
|
279
301
|
- spec/strategy/field/name/random_user_name_spec.rb
|
302
|
+
- spec/strategy/field/number/random_big_decimal_delta_spec.rb
|
280
303
|
- spec/strategy/field/number/random_float_delta_spec.rb
|
281
304
|
- spec/strategy/field/number/random_float_spec.rb
|
282
305
|
- spec/strategy/field/number/random_integer_delta_spec.rb
|
@@ -285,6 +308,7 @@ test_files:
|
|
285
308
|
- spec/strategy/field/string/formatted_string_numbers_spec.rb
|
286
309
|
- spec/strategy/field/string/lorem_ipsum_spec.rb
|
287
310
|
- spec/strategy/field/string/random_string_spec.rb
|
311
|
+
- spec/strategy/field/string/random_url_spec.rb
|
288
312
|
- spec/strategy/field/string/select_from_database_spec.rb
|
289
313
|
- spec/strategy/field/string/select_from_list_spec.rb
|
290
314
|
- spec/strategy/field/string/string_template_spec.rb
|