data-anonymization 0.1.2 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (103) hide show
  1. data/.documentup.json +1 -0
  2. data/.travis.yml +0 -1
  3. data/README.md +277 -52
  4. data/blacklist_dsl.rb +1 -3
  5. data/data-anonymization.gemspec +4 -0
  6. data/lib/core/dsl.rb +1 -1
  7. data/lib/data-anonymization.rb +3 -0
  8. data/lib/strategy/base.rb +21 -11
  9. data/lib/strategy/blacklist.rb +2 -1
  10. data/lib/strategy/field/contact/geojson_base.rb +24 -0
  11. data/lib/strategy/field/contact/random_address.rb +17 -0
  12. data/lib/strategy/field/contact/random_city.rb +17 -0
  13. data/lib/strategy/field/contact/random_phone_number.rb +13 -0
  14. data/lib/strategy/field/contact/random_province.rb +17 -0
  15. data/lib/strategy/field/contact/random_zipcode.rb +17 -0
  16. data/lib/strategy/field/datetime/anonymize_date.rb +39 -0
  17. data/lib/strategy/field/datetime/anonymize_datetime.rb +15 -0
  18. data/lib/strategy/field/datetime/anonymize_time.rb +58 -0
  19. data/lib/strategy/field/datetime/date_delta.rb +21 -0
  20. data/lib/strategy/field/{date_time_delta.rb → datetime/date_time_delta.rb} +3 -3
  21. data/lib/strategy/field/datetime/time_delta.rb +12 -0
  22. data/lib/strategy/field/default_anon.rb +12 -7
  23. data/lib/strategy/field/email/gmail_template.rb +16 -0
  24. data/lib/strategy/field/{random_email.rb → email/random_email.rb} +0 -0
  25. data/lib/strategy/field/{random_mailinator_email.rb → email/random_mailinator_email.rb} +0 -2
  26. data/lib/strategy/field/fields.rb +51 -20
  27. data/lib/strategy/field/name/random_first_name.rb +14 -0
  28. data/lib/strategy/field/{random_full_name.rb → name/random_full_name.rb} +0 -0
  29. data/lib/strategy/field/name/random_last_name.rb +14 -0
  30. data/lib/strategy/field/{random_user_name.rb → name/random_user_name.rb} +0 -0
  31. data/lib/strategy/field/number/random_float.rb +23 -0
  32. data/lib/strategy/field/{random_float_delta.rb → number/random_float_delta.rb} +2 -4
  33. data/lib/strategy/field/{random_int.rb → number/random_integer.rb} +1 -1
  34. data/lib/strategy/field/{random_integer_delta.rb → number/random_integer_delta.rb} +2 -5
  35. data/lib/strategy/field/{random_phone_number.rb → string/formatted_string_numbers.rb} +4 -1
  36. data/lib/strategy/field/{lorem_ipsum.rb → string/lorem_ipsum.rb} +0 -0
  37. data/lib/strategy/field/{random_string.rb → string/random_string.rb} +0 -0
  38. data/lib/strategy/field/{distinct_column_values.rb → string/select_from_database.rb} +2 -3
  39. data/lib/strategy/field/string/select_from_file.rb +18 -0
  40. data/lib/strategy/field/string/select_from_list.rb +17 -0
  41. data/lib/strategy/field/{string_template.rb → string/string_template.rb} +0 -0
  42. data/lib/strategy/whitelist.rb +4 -2
  43. data/lib/utils/database.rb +8 -6
  44. data/lib/utils/geojson_parser.rb +42 -0
  45. data/lib/utils/logging.rb +0 -9
  46. data/lib/utils/progress_bar.rb +29 -0
  47. data/lib/utils/random_float.rb +12 -0
  48. data/lib/utils/random_int.rb +3 -7
  49. data/lib/utils/resource.rb +4 -0
  50. data/lib/version.rb +1 -1
  51. data/resources/UK_addresses.geojson +300 -0
  52. data/resources/US_addresses.geojson +300 -0
  53. data/spec/acceptance/rdbms_blacklist_spec.rb +2 -2
  54. data/spec/acceptance/rdbms_whitelist_spec.rb +6 -8
  55. data/spec/resource/sample.geojson +1 -0
  56. data/spec/spec_helper.rb +3 -2
  57. data/spec/strategy/field/contact/random_address_spec.rb +12 -0
  58. data/spec/strategy/field/contact/random_city_spec.rb +14 -0
  59. data/spec/strategy/field/contact/random_phone_number_spec.rb +16 -0
  60. data/spec/strategy/field/contact/random_province_spec.rb +14 -0
  61. data/spec/strategy/field/contact/random_zipcode_spec.rb +14 -0
  62. data/spec/strategy/field/datetime/anonymize_date_spec.rb +27 -0
  63. data/spec/strategy/field/datetime/anonymize_datetime_spec.rb +57 -0
  64. data/spec/strategy/field/datetime/anonymize_time_spec.rb +57 -0
  65. data/spec/strategy/field/datetime/date_delta_spec.rb +36 -0
  66. data/spec/strategy/field/{date_time_delta_spec.rb → datetime/date_time_delta_spec.rb} +3 -2
  67. data/spec/strategy/field/datetime/time_delta_spec.rb +44 -0
  68. data/spec/strategy/field/default_anon_spec.rb +42 -0
  69. data/spec/strategy/field/email/gmail_template_spec.rb +17 -0
  70. data/spec/strategy/field/{random_email_spec.rb → email/random_email_spec.rb} +2 -2
  71. data/spec/strategy/field/email/random_mailinator_email_spec.rb +14 -0
  72. data/spec/strategy/field/{random_first_name_spec.rb → name/random_first_name_spec.rb} +2 -2
  73. data/spec/strategy/field/{random_full_name_spec.rb → name/random_full_name_spec.rb} +2 -2
  74. data/spec/strategy/field/{random_last_name_spec.rb → name/random_last_name_spec.rb} +2 -2
  75. data/spec/strategy/field/{random_user_name_spec.rb → name/random_user_name_spec.rb} +2 -2
  76. data/spec/strategy/field/{random_float_delta_spec.rb → number/random_float_delta_spec.rb} +2 -2
  77. data/spec/strategy/field/number/random_float_spec.rb +28 -0
  78. data/spec/strategy/field/{random_integer_delta_spec.rb → number/random_integer_delta_spec.rb} +3 -5
  79. data/spec/strategy/field/{random_int_spec.rb → number/random_integer_spec.rb} +4 -4
  80. data/spec/strategy/field/random_boolean_spec.rb +2 -2
  81. data/spec/strategy/field/string/formatted_string_numbers_spec.rb +15 -0
  82. data/spec/strategy/field/{lorem_ipsum_spec.rb → string/lorem_ipsum_spec.rb} +2 -2
  83. data/spec/strategy/field/{random_string_spec.rb → string/random_string_spec.rb} +2 -2
  84. data/spec/strategy/field/{distinct_column_values_spec.rb → string/select_from_database_spec.rb} +3 -3
  85. data/spec/strategy/field/{random_selection_spec.rb → string/select_from_list_spec.rb} +5 -5
  86. data/spec/strategy/field/{string_template_spec.rb → string/string_template_spec.rb} +2 -2
  87. data/spec/strategy/field/whitelist_spec.rb +2 -2
  88. data/spec/support/customer_sample.rb +1 -1
  89. data/spec/utils/database_spec.rb +2 -2
  90. data/spec/utils/geojson_parser_spec.rb +38 -0
  91. data/whitelist_dsl.rb +4 -6
  92. metadata +163 -59
  93. data/lib/strategy/field/anonymize_time.rb +0 -57
  94. data/lib/strategy/field/gmail_template.rb +0 -17
  95. data/lib/strategy/field/random_first_name.rb +0 -18
  96. data/lib/strategy/field/random_last_name.rb +0 -19
  97. data/lib/strategy/field/random_selection.rb +0 -23
  98. data/lib/strategy/field/user_name_template.rb +0 -22
  99. data/spec/strategy/field/anonymize_time_spec.rb +0 -23
  100. data/spec/strategy/field/gmail_template_spec.rb +0 -14
  101. data/spec/strategy/field/random_mailinator_email_spec.rb +0 -21
  102. data/spec/strategy/field/random_phone_number_spec.rb +0 -35
  103. data/spec/strategy/field/user_name_template_spec.rb +0 -13
data/.documentup.json CHANGED
@@ -2,6 +2,7 @@
2
2
  "repo": "sunitparekh/data-anonymization",
3
3
  "name": "Data Anonymization",
4
4
  "theme": "v1",
5
+ "color": "#336699",
5
6
  "travis": true,
6
7
  "twitter": ["dataanon"],
7
8
  "google_analytics":"UA-34000799-1"
data/.travis.yml CHANGED
@@ -1,5 +1,4 @@
1
1
  language: ruby
2
- before_install: gem install bundler --pre
3
2
  before_script: rake empty_dest
4
3
  rvm:
5
4
  - 1.9.2
data/README.md CHANGED
@@ -1,16 +1,19 @@
1
1
  # Data::Anonymization
2
- Tool to create anonymized production data dump to use for PREF and other TEST environments.
2
+ Tool to create anonymized production data dump to use for PERF and other TEST environments.
3
3
 
4
4
  ## Getting started
5
- Install gem using:
5
+ Install gem using (use `pre` option to tryout edge version):
6
6
 
7
7
  $ gem install data-anonymization
8
8
 
9
+ Install required database adapter library for active record:
10
+
11
+ $ gem install sqlite3
12
+
9
13
  Create ruby program using data-anonymization DSL as following `my_dsl.rb`:
10
14
 
11
15
  ```ruby
12
16
  require 'data-anonymization'
13
- DF = DataAnon::Strategy::Field
14
17
 
15
18
  database 'DatabaseName' do
16
19
  strategy DataAnon::Strategy::Blacklist # whitelist (default) or blacklist
@@ -18,10 +21,12 @@ database 'DatabaseName' do
18
21
  # database config as active record connection hash
19
22
  source_db :adapter => 'sqlite3', :database => 'sample-data/chinook-empty.sqlite'
20
23
 
24
+ # User -> table name (case sensitive)
21
25
  table 'User' do
22
- primary_key 'id'
23
- anonymize 'DateOfBirth' # uses default anonymization based on data types
24
- anonymize('UserName').using DF::StringTemplate.new('user#{row_number}')
26
+ # id, DateOfBirth, FirstName, LastName, UserName, Password -> table column names (case sensitive)
27
+ primary_key 'id' # composite key is also supported
28
+ anonymize 'DateOfBirth','FirstName','LastName' # uses default anonymization based on data types
29
+ anonymize('UserName').using FieldStrategy::StringTemplate.new('user#{row_number}')
25
30
  anonymize('Password') { |field| "password" }
26
31
  end
27
32
 
@@ -34,23 +39,26 @@ Run using:
34
39
 
35
40
  $ ruby my_dsl.rb
36
41
 
37
- ### Share feedback
38
- Please use Github [issues](https://github.com/sunitparekh/data-anonymization/issues) to share feedback, feature suggestions or found any issues.
42
+ ## Examples
43
+
44
+ 1. [Whitelist](https://github.com/sunitparekh/data-anonymization/blob/master/whitelist_dsl.rb)
45
+ 2. [Blacklist](https://github.com/sunitparekh/data-anonymization/blob/master/blacklist_dsl.rb)
39
46
 
40
- Read more to learn all the features of the tool...
47
+ #### Share feedback
48
+ Please use Github [issues](https://github.com/sunitparekh/data-anonymization/issues) to share feedback, feature suggestions and report issues.
41
49
 
42
50
  ## What is data anonymization?
43
51
 
44
- For almost all the project it is almost a need to have production data dump to run performance tests, rehearsal production releases and debugging production issues.
45
- However, getting production data and using it is not feasible due to multiple reasons and one of them is users personal data in database. And hence the need of data anonymization.
52
+ For almost all projects there is a need to have production data dump in order to run performance tests, rehearsal production releases and debugging production issues.
53
+ However, getting production data and using it is not feasible due to multiple reasons, one of them being that personal user data would be exposed. And thus arises the need for data anonymization.
46
54
  This tool helps you to get anonymized production data dump using either Blacklist or Whitelist strategies.
47
55
 
48
56
  ## Anonymization Strategies
49
57
 
50
58
  ### Blacklist
51
- This approach is essentially to leave all fields unchanged with the exception of a few which are scrambled/anonymized (hence the name blacklist).
52
- Blacklist create a copy of prod database and choose the fields to be anonymized like e.g. username, password, email, name, geo location etc. Most of the fields had different rules e.g. password as always set to same value for all user, email need to be valid email (we used gmail trick with +N appended to it).
53
- Problem with this approach is, if new fields are added it will not be anonymized be default. Risk of user personal data passing through in future.
59
+ This approach essentially leaves all fields unchanged with the exception of those specified by the user, which are scrambled/anonymized (hence the name blacklist).
60
+ Blacklist create a copy of prod database and chooses the fields to be anonymized like e.g. username, password, email, name, geo location etc. based on user specification Most of the fields had different rules e.g. password as always set to same value for all users, email needs to be valid.
61
+ The problem with this approach is that when new fields are added they will not be anonymized by default. Human error in omitting users personal data could be damaging.
54
62
 
55
63
  ```ruby
56
64
  database 'DatabaseName' do
@@ -61,10 +69,11 @@ end
61
69
  ```
62
70
 
63
71
  ### Whitelist
64
- This approach is essentially to scramble/anonymize all fields except list of fields which are allowed to copy called as whitelist.
65
- By default all data needs to be anonymized. So from production database sanitizing the data record by record and insert anonymized data into destination database. Source database is kind of readonly.
66
- Have default anonymization rules based on data types. Have special rules for fields like username, password, email, name, geo location etc. And have list of whitelist fields means its okay to copy the data and doesn't need anonymization.
67
- This way any new field will be default get anonymized and if we need them as is, add it to the whitelist explicitly.
72
+ This approach, by default scrambles/anonymizes all fields except a list of fields which are allowed to copied as is. Hence the name whitelist.
73
+ By default all data needs to be anonymized. So from production database sanitizing the data record by record and insert anonymized data into destination database. Source database need only be readonly.
74
+ All fields would be anonymized using default anonymization strategies based on the datatype, unless an anonymization strategy is specified. For instance special strategies could be used for emails, passwords, usernames etc.
75
+ A list of whitelisted fields which implies that it's okay to copy the data as is and anonymization isn't required.
76
+ This way any new field will be anonymized by default and if we need them as is, add it to the whitelist explicitly. This prevents any human error and protects sensitive information.
68
77
 
69
78
  ```ruby
70
79
  database 'DatabaseName' do
@@ -91,70 +100,266 @@ has following attribute accessor
91
100
  Default anonymization strategy for `string` content. Uses default 'Lorem ipsum...' text or text supplied in strategy to generate same length string.
92
101
 
93
102
  ```ruby
94
- anonymize('UserName').using DataAnon::Strategy::Field::LoremIpsum.new
103
+ anonymize('UserName').using FieldStrategy::LoremIpsum.new
95
104
  ```
96
105
  ```ruby
97
- anonymize('UserName').using DataAnon::Strategy::Field::LoremIpsum.new("very large string....")
106
+ anonymize('UserName').using FieldStrategy::LoremIpsum.new("very large string....")
98
107
  ```
99
108
  ```ruby
100
- anonymize('UserName').using DataAnon::Strategy::Field::LoremIpsum.new(File.read('my_file.txt'))
109
+ anonymize('UserName').using FieldStrategy::LoremIpsum.new(File.read('my_file.txt'))
101
110
  ```
102
111
 
103
112
  ### RandomString
104
113
  Generates random string of same length.
105
114
  ```ruby
106
- anonymize('UserName').using DataAnon::Strategy::Field::RandomString.new
115
+ anonymize('UserName').using FieldStrategy::RandomString.new
107
116
  ```
108
117
 
109
118
  ### StringTemplate
110
119
  Simple string evaluation within [DataAnon::Core::Field](#dataanon-core-field) context. Can be used for email, username anonymization.
111
120
  Make sure to put the string in 'single quote' else it will get evaluated inline.
112
121
  ```ruby
113
- anonymize('UserName').using DataAnon::Strategy::Field::StringTemplate.new('user#{row_number}')
122
+ anonymize('UserName').using FieldStrategy::StringTemplate.new('user#{row_number}')
123
+ ```
124
+ ```ruby
125
+ anonymize('Email').using FieldStrategy::StringTemplate.new('valid.address+#{row_number}@gmail.com')
126
+ ```
127
+ ```ruby
128
+ anonymize('Email').using FieldStrategy::StringTemplate.new('useremail#{row_number}@mailinator.com')
129
+ ```
130
+
131
+ ### SelectFromList
132
+ Select randomly one of the values specified.
133
+ ```ruby
134
+ anonymize('State').using FieldStrategy::SelectFromList.new(['New York','Georgia',...])
135
+ ```
136
+ ```ruby
137
+ anonymize('NameTitle').using FieldStrategy::SelectFromList.new(['Mr','Mrs','Dr',...])
138
+ ```
139
+
140
+ ### SelectFromFile
141
+ Similar to SelectFromList only difference is the list of values are picked up from file. Classical usage is like states field anonymization.
142
+ ```ruby
143
+ anonymize('State').using FieldStrategy::SelectFromFile.new('states.txt')
144
+ ```
145
+
146
+ ### FormattedStringNumber
147
+ Keeping the format same it changes each digit in the string with random digit.
148
+ ```ruby
149
+ anonymize('CreditCardNumber').using FieldStrategy::FormattedStringNumber.new
150
+ ```
151
+
152
+ ### SelectFromDatabase
153
+ Similar to SelectFromList with difference is the list of values are collected from the database table using distinct column query.
154
+ ```ruby
155
+ # values are collected using `select distinct state from customers` query
156
+ anonymize('State').using FieldStrategy::SelectFromDatabase.new('customers','state')
157
+ ```
158
+
159
+ ### RandomAddress
160
+ Generates address using the [geojson](http://www.geojson.org/geojson-spec.html) format file. The default US/UK file chooses randomly from 300 addresses.
161
+ The large data set can be downloaded from [here](http://www.infochimps.com/datasets/simplegeo-places-dump)
162
+ ```ruby
163
+ anonymize('Address').using FieldStrategy::RandomAddress.region_US
164
+ ```
165
+ ```ruby
166
+ anonymize('Address').using FieldStrategy::RandomAddress.region_UK
167
+ ```
168
+ ```ruby
169
+ # get your own geo_json file and use it
170
+ anonymize('Address').using FieldStrategy::RandomAddress.new('my_geo_json.json')
171
+ ```
172
+
173
+ ### RandomCity
174
+ Similar to RandomAddress, generates city using the [geojson](http://www.geojson.org/geojson-spec.html) format file. The default US/UK file chooses randomly from 300 addresses.
175
+ The large data set can be downloaded from [here](http://www.infochimps.com/datasets/simplegeo-places-dump)
176
+ ```ruby
177
+ anonymize('City').using FieldStrategy::RandomCity.region_US
178
+ ```
179
+ ```ruby
180
+ anonymize('City').using FieldStrategy::RandomCity.region_UK
181
+ ```
182
+ ```ruby
183
+ # get your own geo_json file and use it
184
+ anonymize('City').using FieldStrategy::RandomCity.new('my_geo_json.json')
185
+ ```
186
+
187
+ ### RandomProvince
188
+ Similar to RandomAddress, generates province using the [geojson](http://www.geojson.org/geojson-spec.html) format file. The default US/UK file chooses randomly from 300 addresses.
189
+ The large data set can be downloaded from [here](http://www.infochimps.com/datasets/simplegeo-places-dump)
190
+ ```ruby
191
+ anonymize('Province').using FieldStrategy::RandomProvince.region_US
192
+ ```
193
+ ```ruby
194
+ anonymize('Province').using FieldStrategy::RandomProvince.region_UK
195
+ ```
196
+ ```ruby
197
+ # get your own geo_json file and use it
198
+ anonymize('Province').using FieldStrategy::RandomProvince.new('my_geo_json.json')
199
+ ```
200
+
201
+ ### RandomZipcode
202
+ Similar to RandomAddress, generates zipcode using the [geojson](http://www.geojson.org/geojson-spec.html) format file. The default US/UK file chooses randomly from 300 addresses.
203
+ The large data set can be downloaded from [here](http://www.infochimps.com/datasets/simplegeo-places-dump)
204
+ ```ruby
205
+ anonymize('Address').using FieldStrategy::RandomZipcode.region_US
206
+ ```
207
+ ```ruby
208
+ anonymize('Address').using FieldStrategy::RandomZipcode.region_UK
209
+ ```
210
+ ```ruby
211
+ # get your own geo_json file and use it
212
+ anonymize('Address').using FieldStrategy::RandomZipcode.new('my_geo_json.json')
213
+ ```
214
+
215
+ ### RandomPhoneNumber
216
+ Keeping the format same it changes each digit in the string with random digit.
217
+ ```ruby
218
+ anonymize('PhoneNumber').using FieldStrategy::RandomPhoneNumber.new
219
+ ```
220
+
221
+ ### AnonymizeDateTime
222
+ Anonymizes each field(except year and seconds) within the natural range (e.g. hour between 1-24 and day within the month) based on true/false
223
+ input for that field. By default, all fields are anonymized.
224
+ ```ruby
225
+ #anonymizes month and hour fields, leaving the day and minute fields untouched
226
+ anonymize('DateOfBirth').using FieldStrategy::AnonymizeDateTime.new(true,false,true,false)
227
+ ```
228
+
229
+ In addition to customizing which fields you want anonymized, there are some helper methods which allow for quick anonymization
230
+ ```ruby
231
+ # anonymizes only the month field
232
+ anonymize('DateOfBirth').using FieldStrategy::AnonymizeDateTime.only_month
233
+ # anonymizes only the day field
234
+ anonymize('DateOfBirth').using FieldStrategy::AnonymizeDateTime.only_day
235
+ # anonymizes only the hour field
236
+ anonymize('DateOfBirth').using FieldStrategy::AnonymizeDateTime.only_hour
237
+ # anonymizes only the minute field
238
+ anonymize('DateOfBirth').using FieldStrategy::AnonymizeDateTime.only_minute
114
239
  ```
240
+
241
+ ### AnonymizeTime
242
+ Exactly similar to the above DateTime strategy, except that the returned object is of type `Time`
243
+
244
+ ### AnonymizeDate
245
+ Anonmizes day and month fields within natural range based on true/false input for that field. By defaut both fields are
246
+ anonymized
115
247
  ```ruby
116
- anonymize('Email').using DataAnon::Strategy::Field::StringTemplate.new('valid.address+#{row_number}@gmail.com')
248
+ # anonymizes month and leaves day unchanged
249
+ anonymize('DateOfBirth').using FieldStrategy::AnonymizeDate.new(true,false)
117
250
  ```
251
+
252
+ In addition to customizing which fields you want anonymized, there are some helper methods which allow for quick anonymization
118
253
  ```ruby
119
- anonymize('Email').using DataAnon::Strategy::Field::StringTemplate.new('useremail#{row_number}@mailinator.com')
254
+ # anonymizes only the month field
255
+ anonymize('DateOfBirth').using FieldStrategy::AnonymizeDate.only_month
256
+ # anonymizes only the day field
257
+ anonymize('DateOfBirth').using FieldStrategy::AnonymizeDate.only_day
120
258
  ```
121
259
 
122
260
  ### DateTimeDelta
123
261
  Shifts data randomly within given range. Default shifts date within 10 days + or - and shifts time within 30 minutes.
124
262
  ```ruby
125
- anonymize('DateOfBirth').using DataAnon::Strategy::Field::DateTimeDelta.new
263
+ anonymize('DateOfBirth').using FieldStrategy::DateTimeDelta.new
126
264
  ```
127
265
  ```ruby
128
266
  # shifts date within 20 days and time within 50 minutes
129
- anonymize('DateOfBirth').using DataAnon::Strategy::Field::DateTimeDelta.new(20, 50)
267
+ anonymize('DateOfBirth').using FieldStrategy::DateTimeDelta.new(20, 50)
268
+ ```
269
+
270
+ ### TimeDelta
271
+ Exactly similar to the above DateTime strategy, except that the returned object is of type `Time`
272
+
273
+ ### DateDelta
274
+
275
+ Shifts date randomly within given delta range. Default shits date within 10 days + or -
276
+ ```ruby
277
+ anonymize('DateOfBirth').using FieldStrategy::AnonymizeDate.new
278
+ ```
279
+ ```ruby
280
+ # shifts date within 25 days
281
+ anonymize('DateOfBirth').using FieldStrategy::DateDelta.new(25)
130
282
  ```
131
283
 
132
284
  ### RandomEmail
133
285
  Generates email randomly using the given HOSTNAME and TLD.
134
286
  By defaults generates hostname randomly along with email id.
135
287
  ```ruby
136
- anonymize('DateOfBirth').using DataAnon::Strategy::Field::RandomEmail.new('thoughtworks','com')
288
+ anonymize('Email').using FieldStrategy::RandomEmail.new('thoughtworks','com')
289
+ ```
290
+
291
+ ### GmailTemplate
292
+ Generates a valid unique gmail address by taking advantage of the gmail + strategy. Takes in a valid gmail username and
293
+ generates emails of the form username+<number>@gmail.com
294
+ ```ruby
295
+ anonymize('Email').using FieldStrategy::GmailTemplate.new('username')
137
296
  ```
138
297
 
139
298
  ### RandomMailinatorEmail
140
299
  Generates random email using mailinator hostname. e.g. <randomstring>@mailinator.com
141
300
  ```ruby
142
- anonymize('DateOfBirth').using DataAnon::Strategy::Field::RandomMailinatorEmail.new
301
+ anonymize('Email').using FieldStrategy::RandomMailinatorEmail.new
143
302
  ```
144
303
 
145
304
  ### RandomUserName
305
+ Generates random user name of same length as original user name.
306
+ ```ruby
307
+ anonymize('Username').using FieldStrategy::RandomUserName.new
308
+ ```
309
+
146
310
  ### RandomFirstName
311
+ Randomly picks up first name from the predefined list in the file. Default [file](https://raw.github.com/sunitparekh/data-anonymization/master/resources/first_names.txt) is part of the gem.
312
+ File should contain first name on each line.
313
+ ```ruby
314
+ anonymize('FirstName').using FieldStrategy::RandomFirstName.new
315
+ ```
316
+ ```ruby
317
+ anonymize('FirstName').using FieldStrategy::RandomFirstName.new('my_first_names.txt')
318
+ ```
319
+
147
320
  ### RandomLastName
321
+ Randomly picks up last name from the predefined list in the file. Default [file](https://raw.github.com/sunitparekh/data-anonymization/master/resources/last_names.txt) is part of the gem.
322
+ File should contain last name on each line.
323
+ ```ruby
324
+ anonymize('LastName').using FieldStrategy::RandomLastName.new
325
+ ```
326
+ ```ruby
327
+ anonymize('LastName').using FieldStrategy::RandomLastName.new('my_last_names.txt')
328
+ ```
329
+
148
330
  ### RandomFullName
149
- ### RandomInt
150
- ### RandomIntegerDelta
151
- ### RandomFloatDelta
331
+ Generates full name using the RandomFirstName and RandomLastName strategies.
332
+ It also creates the s
333
+ ```ruby
334
+ anonymize('FullName').using FieldStrategy::RandomFullName.new
335
+ ```
336
+ ```ruby
337
+ anonymize('FullName').using FieldStrategy::RandomLastName.new('my_first_names.txt', 'my_last_names.txt')
338
+ ```
152
339
 
153
- - - -
340
+ ### RandomInteger
341
+ Generates random integer number between given two numbers. Default range is 0 to 100.
342
+ ```ruby
343
+ anonymize('Age').using FieldStrategy::RandomInteger.new(18,70)
344
+ ```
154
345
 
346
+ ### RandomIntegerDelta
347
+ Shifts the current value randomly within given delta + and -. Default is 10
348
+ ```ruby
349
+ anonymize('Age').using FieldStrategy::RandomIntegerDelta.new(2)
350
+ ```
155
351
 
352
+ ### RandomFloat
353
+ Generates random float number between given two numbers. Default range is 0.0 to 100.0
354
+ ```ruby
355
+ anonymize('points').using FieldStrategy::RandomInteger.new(3.0,5.0)
356
+ ```
156
357
 
157
- - - -
358
+ ### RandomFloatDelta
359
+ Shifts the current value randomly within given delta + and -. Default is 10.0
360
+ ```ruby
361
+ anonymize('points').using FieldStrategy::RandomFloatDelta.new(2.5)
362
+ ```
158
363
 
159
364
  ## Write you own field strategies
160
365
  field parameter in following code is [DataAnon::Core::Field](#dataanon-core-field)
@@ -185,11 +390,16 @@ write your own anonymous field strategies within DSL,
185
390
  ## Default field strategies
186
391
 
187
392
  ```ruby
188
- # Work in progress...
189
- DEFAULT_STRATEGIES = {:string => FS::LoremIpsum.new,
190
- :integer => FS::RandomInt.new(18,70),
191
- :datetime => FS::DateTimeDelta.new,
192
- :boolean => FS::RandomBoolean.new
393
+ # Work in progress... TO BE COMPLETED
394
+ DEFAULT_STRATEGIES = {:string => FieldStrategy::LoremIpsum.new,
395
+ :fixnum => FieldStrategy::RandomIntegerDelta.new(5),
396
+ :bignum => FieldStrategy::RandomIntegerDelta.new(5000),
397
+ :float => FieldStrategy::RandomFloatDelta.new(5.0),
398
+ :datetime => FieldStrategy::DateTimeDelta.new,
399
+ :time => FieldStrategy::TimeDelta.new,
400
+ :date => FieldStrategy::DateDelta.new,
401
+ :trueclass => FieldStrategy::RandomBoolean.new,
402
+ :falseclass => FieldStrategy::RandomBoolean.new
193
403
  }
194
404
  ```
195
405
 
@@ -198,23 +408,18 @@ Overriding default field strategies,
198
408
  ```ruby
199
409
  database 'Chinook' do
200
410
  ...
201
- default_field_strategies :string => DataAnon::Strategy::Field::RandomString.new
411
+ default_field_strategies :string => FieldStrategy::RandomString.new
202
412
  ...
203
413
  end
204
414
  ```
205
415
 
206
- ## Examples
207
-
208
- 1. [Whitelist](https://github.com/sunitparekh/data-anonymization/blob/master/whitelist_dsl.rb)
209
- 2. [Blacklist](https://github.com/sunitparekh/data-anonymization/blob/master/blacklist_dsl.rb)
210
-
211
-
212
416
  ## Logging
213
417
 
214
- `Progress Logger` provides progress of anonymization execution table by table.
418
+ How do I switch off the progress bar?
215
419
 
216
420
  ```ruby
217
- DataAnon::Utils::Logging.progress_logger.level = Logger::WARN
421
+ # add following line in your ruby file
422
+ ENV['show_progress'] = 'false'
218
423
  ```
219
424
 
220
425
  `Logger` provides debug level messages including database queries of active record.
@@ -225,15 +430,35 @@ DataAnon::Utils::Logging.logger.level = Logger::INFO
225
430
 
226
431
  ## Changelog
227
432
 
433
+ #### 0.2.0
434
+
435
+ 1. Added the progress bar using 'powerbar' gem. Which also shows the ETA for each table.
436
+ 2. Added More strategies
437
+ 3. Fixed default anonymization strategies for boolean and integer values
438
+ 4. Added support for composite primary key
228
439
 
229
- ### 0.1.1 (August 13, 2012)
440
+ #### 0.1.2 (August 14, 2012)
230
441
 
231
442
  1. First initial release
232
443
 
233
- ## What's plan ahead?
444
+ ## Roadmap
445
+
446
+ #### 0.2.0
447
+
448
+ 1. Complete list of all the field strategies planned supporting all data types
449
+
450
+ #### 0.3.0
234
451
 
235
452
  1. Run anonymization in parallel threads (performance enchantments)
236
- 2. MongoDB anonymization support (NoSQL document based database support)
453
+
454
+ #### 0.4.0
455
+
456
+ 1. MongoDB anonymization support (NoSQL document based database support)
457
+
458
+ #### 0.5.0
459
+
460
+ 1. Generate DSL from database and build schema from source as part of Whitelist approach.
461
+
237
462
 
238
463
  ## Want to contribute?
239
464
 
@@ -249,7 +474,7 @@ DataAnon::Utils::Logging.logger.level = Logger::INFO
249
474
 
250
475
  ## Credits
251
476
 
252
- - [ThoughtWorks Inc](http://www.thoughtworks.com), for allowing us to build this tool and make if open source.
477
+ - [ThoughtWorks Inc](http://www.thoughtworks.com), for allowing us to build this tool and make it open source.
253
478
  - [Birinder](https://twitter.com/birinder_) and [Panda](https://twitter.com/sarbashrestha) for reviewing the documentation.
254
479
 
255
480