data-anonymization 0.2.0 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
  Tool to create anonymized production data dump to use for PERF and other TEST environments.
3
3
 
4
4
  ## Getting started
5
- Install gem using (use `pre` option to tryout edge version):
5
+ Install gem using:
6
6
 
7
7
  $ gem install data-anonymization
8
8
 
@@ -41,23 +41,59 @@ Run using:
41
41
 
42
42
  ## Examples
43
43
 
44
- 1. [Whitelist](https://github.com/sunitparekh/data-anonymization/blob/master/whitelist_dsl.rb)
45
- 2. [Blacklist](https://github.com/sunitparekh/data-anonymization/blob/master/blacklist_dsl.rb)
44
+ 1. [Whitelist using Chinoook sample database](https://github.com/sunitparekh/data-anonymization/blob/master/whitelist_dsl.rb)
45
+ 2. [Blacklist using Chinoook sample database](https://github.com/sunitparekh/data-anonymization/blob/master/blacklist_dsl.rb)
46
+ 3. [Whitelist with composite primary key using DellStore sample database](https://github.com/sunitparekh/test-anonymization/blob/master/dell_whitelist.rb)
47
+ 4. [Blacklist with composite primary key using DellStore sample database](https://github.com/sunitparekh/test-anonymization/blob/master/dell_blacklist.rb)
48
+
49
+ ## Changelog
50
+
51
+ #### 0.3.0 (Sep 4, 2012)
52
+
53
+ Major changes:
54
+
55
+ 1. Added support for Parallel table execution
56
+ 2. Change in default String strategy from LoremIpsum to RandomString based on end user feedback.
57
+ 3. Fixed issue with table column name 'type' as this is default name for STI in activerecord.
58
+
59
+ Please see the [Github 0.3.0 milestone page](https://github.com/sunitparekh/data-anonymization/issues?milestone=1&page=1&state=open) for more details on changes/fixes in release 0.3.0
60
+
61
+ #### 0.2.0 (August 16, 2012)
62
+
63
+ 1. Added the progress bar using 'powerbar' gem. Which also shows the ETA for each table.
64
+ 2. Added More strategies
65
+ 3. Fixed default anonymization strategies for boolean and integer values
66
+ 4. Added support for composite primary key
67
+
68
+ #### 0.1.2 (August 14, 2012)
69
+
70
+ 1. First initial release
71
+
72
+ ## Roadmap
73
+
74
+ #### 0.4.0
75
+
76
+ 1. MongoDB anonymization support (NoSQL document based database support)
77
+
78
+ #### 0.5.0
79
+
80
+ 1. Generate DSL from database and build schema from source as part of Whitelist approach.
46
81
 
47
82
  #### Share feedback
48
83
  Please use Github [issues](https://github.com/sunitparekh/data-anonymization/issues) to share feedback, feature suggestions and report issues.
49
84
 
50
85
  ## What is data anonymization?
51
86
 
52
- For almost all projects there is a need to have production data dump in order to run performance tests, rehearsal production releases and debugging production issues.
53
- However, getting production data and using it is not feasible due to multiple reasons, one of them being that personal user data would be exposed. And thus arises the need for data anonymization.
87
+ For almost all projects there is a need for production data dump in order to run performance tests, rehearse production releases and debug production issues.
88
+ However, getting production data and using it is not feasible due to multiple reasons, primary being privacy concerns for user data. And thus the need for data anonymization.
54
89
  This tool helps you to get anonymized production data dump using either Blacklist or Whitelist strategies.
55
90
 
56
91
  ## Anonymization Strategies
57
92
 
58
93
  ### Blacklist
59
94
  This approach essentially leaves all fields unchanged with the exception of those specified by the user, which are scrambled/anonymized (hence the name blacklist).
60
- Blacklist create a copy of prod database and chooses the fields to be anonymized like e.g. username, password, email, name, geo location etc. based on user specification Most of the fields had different rules e.g. password as always set to same value for all users, email needs to be valid.
95
+ For `Blacklist` create a copy of prod database and chooses the fields to be anonymized like e.g. username, password, email, name, geo location etc. based on user specification. Most of the fields have different rules e.g. password should be set to same value for all users, email needs to be valid.
96
+
61
97
  The problem with this approach is that when new fields are added they will not be anonymized by default. Human error in omitting users personal data could be damaging.
62
98
 
63
99
  ```ruby
@@ -70,9 +106,9 @@ end
70
106
 
71
107
  ### Whitelist
72
108
  This approach, by default scrambles/anonymizes all fields except a list of fields which are allowed to copied as is. Hence the name whitelist.
73
- By default all data needs to be anonymized. So from production database sanitizing the data record by record and insert anonymized data into destination database. Source database need only be readonly.
74
- All fields would be anonymized using default anonymization strategies based on the datatype, unless an anonymization strategy is specified. For instance special strategies could be used for emails, passwords, usernames etc.
75
- A list of whitelisted fields which implies that it's okay to copy the data as is and anonymization isn't required.
109
+ By default all data needs to be anonymized. So from production database data is sanitized record by record and inserted as anonymized data into destination database. Source database needs to be readonly.
110
+ All fields would be anonymized using default anonymization strategy which is based on the datatype, unless a special anonymization strategy is specified. For instance special strategies could be used for emails, passwords, usernames etc.
111
+ A whitelisted field implies that it's okay to copy the data as is and anonymization isn't required.
76
112
  This way any new field will be anonymized by default and if we need them as is, add it to the whitelist explicitly. This prevents any human error and protects sensitive information.
77
113
 
78
114
  ```ruby
@@ -84,6 +120,25 @@ database 'DatabaseName' do
84
120
  end
85
121
  ```
86
122
 
123
+ ## Tips
124
+
125
+ 1. In Whitelist approach make source database connection READONLY.
126
+ 2. Change [default field strategies](#default-field-strategies) to avoid using same strategy again and again in your DSL.
127
+ 3. To run anonymization in parallel at Table level, provided no FK constraint on tables use DataAnon::Parallel::Table strategy
128
+
129
+ ## Running in Parallel
130
+ Currently provides capability of running anonymization in parallel at table level provided no FK constraints on tables.
131
+ It uses [Parallel gem](https://github.com/grosser/parallel) provided by Michael Grosser.
132
+ By default it starts multiple parallel ruby processes processing table one by one.
133
+ ```ruby
134
+ database 'DellStore' do
135
+ strategy DataAnon::Strategy::Whitelist
136
+ execution_strategy DataAnon::Parallel::Table # by default sequential table processing
137
+ ...
138
+ end
139
+ ```
140
+
141
+
87
142
  ## DataAnon::Core::Field
88
143
  The object that gets passed along with the field strategies.
89
144
 
@@ -386,7 +441,6 @@ write your own anonymous field strategies within DSL,
386
441
  end
387
442
  ```
388
443
 
389
-
390
444
  ## Default field strategies
391
445
 
392
446
  ```ruby
@@ -428,38 +482,6 @@ ENV['show_progress'] = 'false'
428
482
  DataAnon::Utils::Logging.logger.level = Logger::INFO
429
483
  ```
430
484
 
431
- ## Changelog
432
-
433
- #### 0.2.0
434
-
435
- 1. Added the progress bar using 'powerbar' gem. Which also shows the ETA for each table.
436
- 2. Added More strategies
437
- 3. Fixed default anonymization strategies for boolean and integer values
438
- 4. Added support for composite primary key
439
-
440
- #### 0.1.2 (August 14, 2012)
441
-
442
- 1. First initial release
443
-
444
- ## Roadmap
445
-
446
- #### 0.2.0
447
-
448
- 1. Complete list of all the field strategies planned supporting all data types
449
-
450
- #### 0.3.0
451
-
452
- 1. Run anonymization in parallel threads (performance enchantments)
453
-
454
- #### 0.4.0
455
-
456
- 1. MongoDB anonymization support (NoSQL document based database support)
457
-
458
- #### 0.5.0
459
-
460
- 1. Generate DSL from database and build schema from source as part of Whitelist approach.
461
-
462
-
463
485
  ## Want to contribute?
464
486
 
465
487
  1. Fork it
@@ -476,6 +498,8 @@ DataAnon::Utils::Logging.logger.level = Logger::INFO
476
498
 
477
499
  - [ThoughtWorks Inc](http://www.thoughtworks.com), for allowing us to build this tool and make it open source.
478
500
  - [Birinder](https://twitter.com/birinder_) and [Panda](https://twitter.com/sarbashrestha) for reviewing the documentation.
479
-
501
+ - [Dan Abel](http://www.linkedin.com/pub/dan-abel/0/61b/9b0) for introducing me to Blacklist and Whitelist approach for data anonymization.
502
+ - [Chirga Doshi](https://twitter.com/chiragsdoshi) for encouraging me to get this done.
503
+ - [Aditya Karle](https://twitter.com/adityakarle) for the Logo. (Coming Soon...)
480
504
 
481
505
 
@@ -23,4 +23,5 @@ Gem::Specification.new do |gem|
23
23
  gem.add_dependency('rgeo', '~> 0.3.15')
24
24
  gem.add_dependency('rgeo-geojson', '~> 0.2.3')
25
25
  gem.add_dependency('powerbar', '~> 1.0.8')
26
+ gem.add_dependency('parallel', '~> 0.5.18')
26
27
  end
data/lib/core/database.rb CHANGED
@@ -7,18 +7,26 @@ module DataAnon
7
7
  @name = name
8
8
  @strategy = DataAnon::Strategy::Whitelist
9
9
  @user_defaults = {}
10
+ @tables = []
11
+ @execution_strategy = DataAnon::Core::Sequential
12
+ ENV['parallel_execution'] = 'false'
10
13
  end
11
14
 
12
15
  def strategy strategy
13
16
  @strategy = strategy
14
17
  end
15
18
 
19
+ def execution_strategy execution_strategy
20
+ @execution_strategy = execution_strategy
21
+ ENV['parallel_execution'] = 'true' if execution_strategy == DataAnon::Parallel::Table
22
+ end
23
+
16
24
  def source_db connection_spec
17
- DataAnon::Utils::SourceDatabase.establish_connection connection_spec
25
+ @source_database = connection_spec
18
26
  end
19
27
 
20
28
  def destination_db connection_spec
21
- DataAnon::Utils::DestinationDatabase.establish_connection connection_spec
29
+ @destination_database = connection_spec
22
30
  end
23
31
 
24
32
  def default_field_strategies default_strategies
@@ -26,10 +34,20 @@ module DataAnon
26
34
  end
27
35
 
28
36
  def table (name, &block)
29
- @strategy.new(name, @user_defaults).process_fields(&block).process
37
+ table = @strategy.new(@source_database, @destination_database, name, @user_defaults).process_fields(&block)
38
+ @tables<< table
39
+ end
40
+
41
+ def anonymize
42
+ @execution_strategy.new.anonymize @tables
30
43
  end
31
44
 
45
+ end
32
46
 
47
+ class Sequential
48
+ def anonymize tables
49
+ tables.each { |table| table.process }
50
+ end
33
51
  end
34
52
 
35
53
  end
data/lib/core/dsl.rb CHANGED
@@ -5,7 +5,9 @@ module DataAnon
5
5
 
6
6
  def database(name, &block)
7
7
  logger.debug "Processing Database: #{name}"
8
- DataAnon::Core::Database.new(name).instance_eval &block
8
+ database = DataAnon::Core::Database.new(name)
9
+ database.instance_eval &block
10
+ database.anonymize
9
11
  end
10
12
 
11
13
  end
@@ -4,9 +4,11 @@ require "utils/logging"
4
4
  require "utils/random_int"
5
5
  require "utils/random_float"
6
6
  require "utils/random_string"
7
+ require "utils/random_string_chars_only"
7
8
  require "utils/geojson_parser"
8
9
  require "utils/progress_bar"
9
10
  require "utils/resource"
11
+ require "parallel/table"
10
12
  require "core/database"
11
13
  require "core/field"
12
14
  require "strategy/strategies"
@@ -0,0 +1,13 @@
1
+ require 'parallel'
2
+
3
+ module DataAnon
4
+ module Parallel
5
+ class Table
6
+
7
+ def anonymize tables
8
+ ::Parallel.each(tables) { |table| table.process }
9
+ end
10
+
11
+ end
12
+ end
13
+ end
data/lib/strategy/base.rb CHANGED
@@ -1,14 +1,14 @@
1
- require 'powerbar'
2
-
3
1
  module DataAnon
4
2
  module Strategy
5
3
  class Base
6
4
  include Utils::Logging
7
5
 
8
- def initialize name, user_strategies
6
+ def initialize source_database, destination_database, name, user_strategies
9
7
  @name = name
10
8
  @user_strategies = user_strategies
11
9
  @fields = {}
10
+ @source_database = source_database
11
+ @destination_database = destination_database
12
12
  end
13
13
 
14
14
  def process_fields &block
@@ -50,11 +50,15 @@ module DataAnon
50
50
  end
51
51
 
52
52
  def dest_table
53
- @dest_table ||= Utils::DestinationTable.create @name, @primary_keys
53
+ return @dest_table unless @dest_table.nil?
54
+ DataAnon::Utils::DestinationDatabase.establish_connection @destination_database if @destination_database
55
+ @dest_table = Utils::DestinationTable.create @name, @primary_keys
54
56
  end
55
57
 
56
58
  def source_table
57
- @source_table ||= Utils::SourceTable.create @name, @primary_keys
59
+ return @source_table unless @source_table.nil?
60
+ DataAnon::Utils::SourceDatabase.establish_connection @source_database
61
+ @source_table = Utils::SourceTable.create @name, @primary_keys
58
62
  end
59
63
 
60
64
  def process
@@ -4,10 +4,11 @@ module DataAnon
4
4
 
5
5
  class DefaultAnon
6
6
 
7
- DEFAULT_STRATEGIES = {:string => FieldStrategy::LoremIpsum.new,
7
+ DEFAULT_STRATEGIES = {:string => FieldStrategy::RandomString.new,
8
8
  :fixnum => FieldStrategy::RandomIntegerDelta.new(5),
9
9
  :bignum => FieldStrategy::RandomIntegerDelta.new(5000),
10
10
  :float => FieldStrategy::RandomFloatDelta.new(5.0),
11
+ :bigdecimal => FieldStrategy::RandomBigDecimalDelta.new(500.0),
11
12
  :datetime => FieldStrategy::DateTimeDelta.new,
12
13
  :time => FieldStrategy::TimeDelta.new,
13
14
  :date => FieldStrategy::DateDelta.new,
@@ -21,7 +22,7 @@ module DataAnon
21
22
 
22
23
  def anonymize field
23
24
  strategy = @user_defaults[field.value.class.to_s.downcase.to_sym]
24
- raise "No strategy defined for datatype #{field.value.class}" unless strategy
25
+ raise "No strategy defined for datatype #{field.value.class}. Use 'default_field_strategies' option in your script. Refer to http://sunitparekh.github.com/data-anonymization/#default-field-strategies for more details. " unless strategy
25
26
  strategy.anonymize field
26
27
  end
27
28
 
@@ -7,6 +7,7 @@ require 'strategy/field/anonymous'
7
7
  require 'strategy/field/string/lorem_ipsum'
8
8
  require 'strategy/field/string/string_template'
9
9
  require 'strategy/field/string/random_string'
10
+ require 'strategy/field/string/random_url'
10
11
  require 'strategy/field/string/formatted_string_numbers'
11
12
 
12
13
  require 'strategy/field/string/select_from_file'
@@ -18,6 +19,7 @@ require 'strategy/field/number/random_integer'
18
19
  require 'strategy/field/number/random_float'
19
20
  require 'strategy/field/number/random_integer_delta'
20
21
  require 'strategy/field/number/random_float_delta'
22
+ require 'strategy/field/number/random_big_decimal_delta'
21
23
 
22
24
  # contact
23
25
  require 'strategy/field/contact/geojson_base'
@@ -0,0 +1,19 @@
1
+ require 'bigdecimal'
2
+
3
+ module DataAnon
4
+ module Strategy
5
+ module Field
6
+ class RandomBigDecimalDelta
7
+
8
+ def initialize delta = 100.0
9
+ @delta = delta
10
+ end
11
+
12
+ def anonymize field
13
+ return BigDecimal.new("#{field.value + DataAnon::Utils::RandomFloat.generate(-@delta, +@delta)}")
14
+ end
15
+
16
+ end
17
+ end
18
+ end
19
+ end
@@ -8,14 +8,9 @@ module DataAnon
8
8
  end
9
9
 
10
10
  def anonymize field
11
- return range(field.value-@delta,field.value+@delta)
11
+ return field.value + DataAnon::Utils::RandomFloat.generate(-@delta, +@delta)
12
12
  end
13
13
 
14
- def range (min, max)
15
- Random.new.rand * (max-min) + min
16
- end
17
-
18
-
19
14
  end
20
15
  end
21
16
  end
@@ -0,0 +1,30 @@
1
+ module DataAnon
2
+ module Strategy
3
+ module Field
4
+ class RandomUrl
5
+
6
+ def anonymize field
7
+
8
+ url = field.value
9
+ randomized_url = ""
10
+ protocols = url.scan(/http:\/\/|www\./)
11
+ protocols.each do |token|
12
+ url = url.gsub(token,"")
13
+ randomized_url += token
14
+ end
15
+
16
+ marker_position = 0
17
+
18
+ while marker_position < url.length
19
+ special_char_index = url.index(/\W/, marker_position) || url.length
20
+ text = url[marker_position...special_char_index]
21
+ randomized_url += "#{DataAnon::Utils::RandomStringCharsOnly.generate(text.length)}#{url[special_char_index]}"
22
+ marker_position = special_char_index + 1
23
+ end
24
+
25
+ randomized_url
26
+ end
27
+ end
28
+ end
29
+ end
30
+ end
@@ -5,7 +5,8 @@ module DataAnon
5
5
  class SelectFromDatabase
6
6
  include Utils::Logging
7
7
 
8
- def initialize table_name, field_name
8
+ def initialize table_name, field_name, connection_spec
9
+ DataAnon::Utils::SourceDatabase.establish_connection connection_spec
9
10
  source = Utils::SourceTable.create table_name, []
10
11
  @values = source.select(field_name).uniq.collect { |record| record[field_name]}
11
12
  logger.debug "For field strategy #{table_name}:#{field_name} using values #{@values} "
@@ -29,6 +29,7 @@ module DataAnon
29
29
  self.table_name = table_name
30
30
  self.primary_keys = primary_keys if primary_keys.length > 1
31
31
  self.primary_key = primary_keys[0] if primary_keys.length == 1
32
+ self.inheritance_column = :_type_disabled
32
33
  self.mass_assignment_sanitizer = MassAssignmentIgnoreSanitizer.new(self)
33
34
  end
34
35
  end
@@ -1,24 +1,51 @@
1
+ require 'powerbar'
2
+
1
3
  module DataAnon
2
4
  module Utils
3
5
 
4
6
  class ProgressBar
7
+ include Utils::Logging
5
8
 
6
9
  def initialize table_name, total
7
10
  @total = total
8
11
  @table_name = table_name
9
- @progress_bar = PowerBar.new if total > 0 && show_progress
12
+ @progress_bar = PowerBar.new if total > 0 && show_progress && !parallel?
10
13
  end
11
14
 
12
15
  def show_progress
13
16
  ENV['show_progress'] != 'false'
14
17
  end
15
18
 
19
+ def parallel?
20
+ ENV['parallel_execution'] == 'true'
21
+ end
22
+
16
23
  def show index
17
- if @progress_bar && ((index % 1000 == 0) || (index == @total) || (index == 1))
18
- @progress_bar.show(:msg => "Table: #{@table_name} (#{index}/#{@total})", :done => index, :total => @total)
24
+ if started(index) || regular_interval(index) || complete(index)
25
+ if @progress_bar
26
+ msg = "Table: %-15s [ %6d/%-6d ]" % [ @table_name,index,@total]
27
+ @progress_bar.show(:msg => msg, :done => index, :total => @total)
28
+ elsif parallel?
29
+ suffix = ""
30
+ suffix = "STARTED" if started(index)
31
+ suffix = "COMPLETE" if complete(index)
32
+ logger.info("Table: %-15s [ %6d/%-6d ] %s" % [ @table_name,index,@total, suffix])
33
+ end
19
34
  end
20
35
  end
21
36
 
37
+ def complete index
38
+ index == @total
39
+ end
40
+
41
+ def regular_interval index
42
+ index % 1000 == 0
43
+ end
44
+
45
+ def started index
46
+ index == 1
47
+ end
48
+
22
49
  def close
23
50
  @progress_bar.close if @progress_bar
24
51
  end
@@ -0,0 +1,14 @@
1
+ module DataAnon
2
+ module Utils
3
+ class RandomStringCharsOnly
4
+
5
+ def self.generate length = nil
6
+ length ||= Random.new.rand 5...15
7
+ chars = 'abcdefghjkmnpqrstuvwxyz'
8
+ random_string = ''
9
+ length.times { random_string << chars[rand(chars.size)] }
10
+ random_string
11
+ end
12
+ end
13
+ end
14
+ end
data/lib/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module DataAnonymization
2
- VERSION = "0.2.0"
2
+ VERSION = "0.3.0"
3
3
  end
@@ -18,6 +18,7 @@ describe "End 2 End RDBMS Blacklist Acceptance Test using SQLite database" do
18
18
  table 'customers' do
19
19
  primary_key 'cust_id'
20
20
  anonymize('email').using FieldStrategy::StringTemplate.new('test+#{row_number}@gmail.com')
21
+ anonymize 'terms_n_condition', 'age'
21
22
  end
22
23
  end
23
24
 
@@ -22,12 +22,14 @@ describe "End 2 End RDBMS Whitelist Acceptance Test using SQLite database" do
22
22
 
23
23
  table 'customers' do
24
24
  primary_key 'cust_id'
25
- whitelist 'cust_id', 'address', 'zipcode'
25
+ whitelist 'cust_id', 'address', 'zipcode', 'blog_url'
26
26
  anonymize('first_name').using FieldStrategy::RandomFirstName.new
27
27
  anonymize('last_name').using FieldStrategy::RandomLastName.new
28
28
  anonymize('state').using FieldStrategy::SelectFromList.new(['Gujrat','Karnataka'])
29
29
  anonymize('phone').using FieldStrategy::RandomPhoneNumber.new
30
30
  anonymize('email').using FieldStrategy::StringTemplate.new('test+#{row_number}@gmail.com')
31
+ anonymize 'terms_n_condition', 'age', 'longitude'
32
+ anonymize('latitude').using FieldStrategy::RandomFloatDelta.new(2.0)
31
33
  end
32
34
  end
33
35
 
@@ -42,7 +44,10 @@ describe "End 2 End RDBMS Whitelist Acceptance Test using SQLite database" do
42
44
  new_rec.zipcode.should == '411048'
43
45
  new_rec.phone.should_not be "9923700662"
44
46
  new_rec.email.should == 'test+1@gmail.com'
45
-
47
+ [true,false].should include(new_rec.terms_n_condition)
48
+ new_rec.age.should be_between(0,100)
49
+ new_rec.latitude.should be_between( 38.689060, 42.689060)
50
+ new_rec.longitude.should be_between( -84.044636, -64.044636)
46
51
 
47
52
  end
48
53
  end
data/spec/spec_helper.rb CHANGED
@@ -7,7 +7,7 @@ ENV['show_progress'] = 'false'
7
7
 
8
8
  Dir["#{File.dirname(__FILE__)}/support/**/*.rb"].each {|f| require f}
9
9
 
10
- DataAnon::Utils::Logging.logger.level = Logger::INFO
10
+ DataAnon::Utils::Logging.logger.level = Logger::WARN
11
11
 
12
12
  RSpec.configure do |config|
13
13
  config.expect_with :rspec
@@ -11,7 +11,6 @@ describe FieldStrategy::DateDelta do
11
11
  let(:date_difference) {anonymized_value - field.value}
12
12
 
13
13
  it { anonymized_value.should be_kind_of Date}
14
- it { anonymized_value.should_not == Date.new(2011,4,7) }
15
14
  it { date_difference.should be_between(-5.days, 5.days) }
16
15
  end
17
16
 
@@ -32,6 +32,13 @@ describe FieldStrategy::DefaultAnon do
32
32
  it { anonymized_value.should be_kind_of Fixnum }
33
33
  end
34
34
 
35
+ describe 'anonymized bignum value' do
36
+ let(:field) {DataAnon::Core::Field.new('int_field',2348723489723847382947,1,nil)}
37
+ let(:anonymized_value) {DefaultAnon.new.anonymize(field)}
38
+
39
+ it { anonymized_value.should be_kind_of Bignum }
40
+ end
41
+
35
42
  describe 'anonymized string value' do
36
43
  let(:field) {DataAnon::Core::Field.new('string_field','String',1,nil)}
37
44
  let(:anonymized_value) {DefaultAnon.new.anonymize(field)}
@@ -0,0 +1,20 @@
1
+ require "spec_helper"
2
+ require 'bigdecimal'
3
+
4
+ describe FieldStrategy::RandomBigDecimalDelta do
5
+
6
+ RandomBigDecimalDelta = FieldStrategy::RandomBigDecimalDelta
7
+ let(:field) {DataAnon::Core::Field.new('decimal_field',BigDecimal.new("53422342378687687342893.23324"),1,nil)}
8
+
9
+ describe 'anonymized big decimal should not be the same as original value' do
10
+ let(:anonymized_value) {RandomBigDecimalDelta.new.anonymize(field)}
11
+
12
+ it {anonymized_value.should_not equal field.value}
13
+ end
14
+
15
+ describe 'anonymized value returned should be big decimal' do
16
+ let(:anonymized_value) {RandomBigDecimalDelta.new.anonymize(field)}
17
+
18
+ it { anonymized_value.should be_kind_of BigDecimal }
19
+ end
20
+ end
@@ -6,16 +6,14 @@ describe FieldStrategy::RandomFloatDelta do
6
6
  let(:field) {DataAnon::Core::Field.new('float_field',5.5,1,nil)}
7
7
 
8
8
  describe 'anonymized float should not be the same as original value' do
9
- let(:anonymized_float) {RandomFloatDelta.new(5).anonymize(field)}
9
+ let(:anonymized_value) {RandomFloatDelta.new(5).anonymize(field)}
10
10
 
11
- it {anonymized_float.should_not equal field.value}
11
+ it {anonymized_value.should_not equal field.value}
12
12
  end
13
13
 
14
14
  describe 'anonymized value returned should be a float' do
15
- let(:anonymized_float) {RandomFloatDelta.new(5).anonymize(field)}
15
+ let(:anonymized_value) {RandomFloatDelta.new(5).anonymize(field)}
16
16
 
17
- it { is_float = anonymized_float.is_a? Float
18
- is_float.should be true
19
- }
17
+ it { anonymized_value.should be_kind_of Float }
20
18
  end
21
19
  end
@@ -0,0 +1,15 @@
1
+ require "spec_helper"
2
+
3
+ describe FieldStrategy::RandomUrl do
4
+
5
+ RandomUrl = FieldStrategy::RandomUrl
6
+
7
+ describe 'anonymized url must not be the same as original url' do
8
+ let(:field) {DataAnon::Core::Field.new('string_field','http://fakeurl.com',1,nil)}
9
+ let(:anonymized_url) {RandomUrl.new.anonymize(field)}
10
+
11
+ it {anonymized_url.should_not equal field.value}
12
+ it {anonymized_url.should match /https?:\/\/[\S]+/}
13
+ end
14
+
15
+ end
@@ -2,17 +2,13 @@ require "spec_helper"
2
2
 
3
3
  describe FieldStrategy::SelectFromDatabase do
4
4
 
5
- before(:each) do
6
- source = {:adapter => 'sqlite3', :database => 'sample-data/chinook.sqlite'}
7
- DataAnon::Utils::SourceDatabase.establish_connection source
8
- end
9
-
10
5
  SelectFromDatabase = FieldStrategy::SelectFromDatabase
11
6
  let(:field) { DataAnon::Core::Field.new('name', 'Abcd', 1, nil) }
7
+ let(:source) { {:adapter => 'sqlite3', :database => 'sample-data/chinook.sqlite'} }
12
8
 
13
9
  describe 'more than one values in predefined list' do
14
10
 
15
- let(:anonymized_value) { SelectFromDatabase.new('MediaType','Name').anonymize(field) }
11
+ let(:anonymized_value) { SelectFromDatabase.new('MediaType','Name', source).anonymize(field) }
16
12
 
17
13
  it { anonymized_value.should_not be('Abcd') }
18
14
  it { anonymized_value.should_not be_empty }
@@ -12,6 +12,11 @@ class CustomerSample
12
12
  t.string :zipcode
13
13
  t.string :phone
14
14
  t.string :email
15
+ t.string :blog_url
16
+ t.boolean :terms_n_condition
17
+ t.integer :age
18
+ t.float :latitude
19
+ t.float :longitude
15
20
  end
16
21
  end
17
22
  end
@@ -30,7 +35,8 @@ class CustomerSample
30
35
  SAMPLE_DATA = {:cust_id => 100, :first_name => "Sunit", :last_name => "Parekh",
31
36
  :birth_date => Date.new(1977,7,8), :address => "F 501 Shanti Nagar",
32
37
  :state => "Maharastra", :zipcode => "411048", :phone => "9923700662",
33
- :email => "parekh.sunit@gmail.com"}
38
+ :email => "parekh.sunit@gmail.com", :terms_n_condition => true,
39
+ :age => 34, :longitude => -74.044636, :latitude => +40.689060}
34
40
 
35
41
  def self.insert_record connection_spec, data_hash = SAMPLE_DATA
36
42
  DataAnon::Utils::TempDatabase.establish_connection connection_spec
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data-anonymization
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -11,7 +11,7 @@ authors:
11
11
  autorequire:
12
12
  bindir: bin
13
13
  cert_chain: []
14
- date: 2012-08-17 00:00:00.000000000 Z
14
+ date: 2012-09-04 00:00:00.000000000 Z
15
15
  dependencies:
16
16
  - !ruby/object:Gem::Dependency
17
17
  name: activerecord
@@ -109,6 +109,22 @@ dependencies:
109
109
  - - ~>
110
110
  - !ruby/object:Gem::Version
111
111
  version: 1.0.8
112
+ - !ruby/object:Gem::Dependency
113
+ name: parallel
114
+ requirement: !ruby/object:Gem::Requirement
115
+ none: false
116
+ requirements:
117
+ - - ~>
118
+ - !ruby/object:Gem::Version
119
+ version: 0.5.18
120
+ type: :runtime
121
+ prerelease: false
122
+ version_requirements: !ruby/object:Gem::Requirement
123
+ none: false
124
+ requirements:
125
+ - - ~>
126
+ - !ruby/object:Gem::Version
127
+ version: 0.5.18
112
128
  description: Data anonymization tool for RDBMS databases
113
129
  email:
114
130
  - parekh.sunit@gmail.com
@@ -134,6 +150,7 @@ files:
134
150
  - lib/core/dsl.rb
135
151
  - lib/core/field.rb
136
152
  - lib/data-anonymization.rb
153
+ - lib/parallel/table.rb
137
154
  - lib/strategy/base.rb
138
155
  - lib/strategy/blacklist.rb
139
156
  - lib/strategy/field/anonymous.rb
@@ -158,6 +175,7 @@ files:
158
175
  - lib/strategy/field/name/random_full_name.rb
159
176
  - lib/strategy/field/name/random_last_name.rb
160
177
  - lib/strategy/field/name/random_user_name.rb
178
+ - lib/strategy/field/number/random_big_decimal_delta.rb
161
179
  - lib/strategy/field/number/random_float.rb
162
180
  - lib/strategy/field/number/random_float_delta.rb
163
181
  - lib/strategy/field/number/random_integer.rb
@@ -166,6 +184,7 @@ files:
166
184
  - lib/strategy/field/string/formatted_string_numbers.rb
167
185
  - lib/strategy/field/string/lorem_ipsum.rb
168
186
  - lib/strategy/field/string/random_string.rb
187
+ - lib/strategy/field/string/random_url.rb
169
188
  - lib/strategy/field/string/select_from_database.rb
170
189
  - lib/strategy/field/string/select_from_file.rb
171
190
  - lib/strategy/field/string/select_from_list.rb
@@ -181,6 +200,7 @@ files:
181
200
  - lib/utils/random_float.rb
182
201
  - lib/utils/random_int.rb
183
202
  - lib/utils/random_string.rb
203
+ - lib/utils/random_string_chars_only.rb
184
204
  - lib/utils/resource.rb
185
205
  - lib/version.rb
186
206
  - resources/UK_addresses.geojson
@@ -210,6 +230,7 @@ files:
210
230
  - spec/strategy/field/name/random_full_name_spec.rb
211
231
  - spec/strategy/field/name/random_last_name_spec.rb
212
232
  - spec/strategy/field/name/random_user_name_spec.rb
233
+ - spec/strategy/field/number/random_big_decimal_delta_spec.rb
213
234
  - spec/strategy/field/number/random_float_delta_spec.rb
214
235
  - spec/strategy/field/number/random_float_spec.rb
215
236
  - spec/strategy/field/number/random_integer_delta_spec.rb
@@ -218,6 +239,7 @@ files:
218
239
  - spec/strategy/field/string/formatted_string_numbers_spec.rb
219
240
  - spec/strategy/field/string/lorem_ipsum_spec.rb
220
241
  - spec/strategy/field/string/random_string_spec.rb
242
+ - spec/strategy/field/string/random_url_spec.rb
221
243
  - spec/strategy/field/string/select_from_database_spec.rb
222
244
  - spec/strategy/field/string/select_from_list_spec.rb
223
245
  - spec/strategy/field/string/string_template_spec.rb
@@ -277,6 +299,7 @@ test_files:
277
299
  - spec/strategy/field/name/random_full_name_spec.rb
278
300
  - spec/strategy/field/name/random_last_name_spec.rb
279
301
  - spec/strategy/field/name/random_user_name_spec.rb
302
+ - spec/strategy/field/number/random_big_decimal_delta_spec.rb
280
303
  - spec/strategy/field/number/random_float_delta_spec.rb
281
304
  - spec/strategy/field/number/random_float_spec.rb
282
305
  - spec/strategy/field/number/random_integer_delta_spec.rb
@@ -285,6 +308,7 @@ test_files:
285
308
  - spec/strategy/field/string/formatted_string_numbers_spec.rb
286
309
  - spec/strategy/field/string/lorem_ipsum_spec.rb
287
310
  - spec/strategy/field/string/random_string_spec.rb
311
+ - spec/strategy/field/string/random_url_spec.rb
288
312
  - spec/strategy/field/string/select_from_database_spec.rb
289
313
  - spec/strategy/field/string/select_from_list_spec.rb
290
314
  - spec/strategy/field/string/string_template_spec.rb