data-anonymization 0.1.2 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- data/.documentup.json +1 -0
- data/.travis.yml +0 -1
- data/README.md +277 -52
- data/blacklist_dsl.rb +1 -3
- data/data-anonymization.gemspec +4 -0
- data/lib/core/dsl.rb +1 -1
- data/lib/data-anonymization.rb +3 -0
- data/lib/strategy/base.rb +21 -11
- data/lib/strategy/blacklist.rb +2 -1
- data/lib/strategy/field/contact/geojson_base.rb +24 -0
- data/lib/strategy/field/contact/random_address.rb +17 -0
- data/lib/strategy/field/contact/random_city.rb +17 -0
- data/lib/strategy/field/contact/random_phone_number.rb +13 -0
- data/lib/strategy/field/contact/random_province.rb +17 -0
- data/lib/strategy/field/contact/random_zipcode.rb +17 -0
- data/lib/strategy/field/datetime/anonymize_date.rb +39 -0
- data/lib/strategy/field/datetime/anonymize_datetime.rb +15 -0
- data/lib/strategy/field/datetime/anonymize_time.rb +58 -0
- data/lib/strategy/field/datetime/date_delta.rb +21 -0
- data/lib/strategy/field/{date_time_delta.rb → datetime/date_time_delta.rb} +3 -3
- data/lib/strategy/field/datetime/time_delta.rb +12 -0
- data/lib/strategy/field/default_anon.rb +12 -7
- data/lib/strategy/field/email/gmail_template.rb +16 -0
- data/lib/strategy/field/{random_email.rb → email/random_email.rb} +0 -0
- data/lib/strategy/field/{random_mailinator_email.rb → email/random_mailinator_email.rb} +0 -2
- data/lib/strategy/field/fields.rb +51 -20
- data/lib/strategy/field/name/random_first_name.rb +14 -0
- data/lib/strategy/field/{random_full_name.rb → name/random_full_name.rb} +0 -0
- data/lib/strategy/field/name/random_last_name.rb +14 -0
- data/lib/strategy/field/{random_user_name.rb → name/random_user_name.rb} +0 -0
- data/lib/strategy/field/number/random_float.rb +23 -0
- data/lib/strategy/field/{random_float_delta.rb → number/random_float_delta.rb} +2 -4
- data/lib/strategy/field/{random_int.rb → number/random_integer.rb} +1 -1
- data/lib/strategy/field/{random_integer_delta.rb → number/random_integer_delta.rb} +2 -5
- data/lib/strategy/field/{random_phone_number.rb → string/formatted_string_numbers.rb} +4 -1
- data/lib/strategy/field/{lorem_ipsum.rb → string/lorem_ipsum.rb} +0 -0
- data/lib/strategy/field/{random_string.rb → string/random_string.rb} +0 -0
- data/lib/strategy/field/{distinct_column_values.rb → string/select_from_database.rb} +2 -3
- data/lib/strategy/field/string/select_from_file.rb +18 -0
- data/lib/strategy/field/string/select_from_list.rb +17 -0
- data/lib/strategy/field/{string_template.rb → string/string_template.rb} +0 -0
- data/lib/strategy/whitelist.rb +4 -2
- data/lib/utils/database.rb +8 -6
- data/lib/utils/geojson_parser.rb +42 -0
- data/lib/utils/logging.rb +0 -9
- data/lib/utils/progress_bar.rb +29 -0
- data/lib/utils/random_float.rb +12 -0
- data/lib/utils/random_int.rb +3 -7
- data/lib/utils/resource.rb +4 -0
- data/lib/version.rb +1 -1
- data/resources/UK_addresses.geojson +300 -0
- data/resources/US_addresses.geojson +300 -0
- data/spec/acceptance/rdbms_blacklist_spec.rb +2 -2
- data/spec/acceptance/rdbms_whitelist_spec.rb +6 -8
- data/spec/resource/sample.geojson +1 -0
- data/spec/spec_helper.rb +3 -2
- data/spec/strategy/field/contact/random_address_spec.rb +12 -0
- data/spec/strategy/field/contact/random_city_spec.rb +14 -0
- data/spec/strategy/field/contact/random_phone_number_spec.rb +16 -0
- data/spec/strategy/field/contact/random_province_spec.rb +14 -0
- data/spec/strategy/field/contact/random_zipcode_spec.rb +14 -0
- data/spec/strategy/field/datetime/anonymize_date_spec.rb +27 -0
- data/spec/strategy/field/datetime/anonymize_datetime_spec.rb +57 -0
- data/spec/strategy/field/datetime/anonymize_time_spec.rb +57 -0
- data/spec/strategy/field/datetime/date_delta_spec.rb +36 -0
- data/spec/strategy/field/{date_time_delta_spec.rb → datetime/date_time_delta_spec.rb} +3 -2
- data/spec/strategy/field/datetime/time_delta_spec.rb +44 -0
- data/spec/strategy/field/default_anon_spec.rb +42 -0
- data/spec/strategy/field/email/gmail_template_spec.rb +17 -0
- data/spec/strategy/field/{random_email_spec.rb → email/random_email_spec.rb} +2 -2
- data/spec/strategy/field/email/random_mailinator_email_spec.rb +14 -0
- data/spec/strategy/field/{random_first_name_spec.rb → name/random_first_name_spec.rb} +2 -2
- data/spec/strategy/field/{random_full_name_spec.rb → name/random_full_name_spec.rb} +2 -2
- data/spec/strategy/field/{random_last_name_spec.rb → name/random_last_name_spec.rb} +2 -2
- data/spec/strategy/field/{random_user_name_spec.rb → name/random_user_name_spec.rb} +2 -2
- data/spec/strategy/field/{random_float_delta_spec.rb → number/random_float_delta_spec.rb} +2 -2
- data/spec/strategy/field/number/random_float_spec.rb +28 -0
- data/spec/strategy/field/{random_integer_delta_spec.rb → number/random_integer_delta_spec.rb} +3 -5
- data/spec/strategy/field/{random_int_spec.rb → number/random_integer_spec.rb} +4 -4
- data/spec/strategy/field/random_boolean_spec.rb +2 -2
- data/spec/strategy/field/string/formatted_string_numbers_spec.rb +15 -0
- data/spec/strategy/field/{lorem_ipsum_spec.rb → string/lorem_ipsum_spec.rb} +2 -2
- data/spec/strategy/field/{random_string_spec.rb → string/random_string_spec.rb} +2 -2
- data/spec/strategy/field/{distinct_column_values_spec.rb → string/select_from_database_spec.rb} +3 -3
- data/spec/strategy/field/{random_selection_spec.rb → string/select_from_list_spec.rb} +5 -5
- data/spec/strategy/field/{string_template_spec.rb → string/string_template_spec.rb} +2 -2
- data/spec/strategy/field/whitelist_spec.rb +2 -2
- data/spec/support/customer_sample.rb +1 -1
- data/spec/utils/database_spec.rb +2 -2
- data/spec/utils/geojson_parser_spec.rb +38 -0
- data/whitelist_dsl.rb +4 -6
- metadata +163 -59
- data/lib/strategy/field/anonymize_time.rb +0 -57
- data/lib/strategy/field/gmail_template.rb +0 -17
- data/lib/strategy/field/random_first_name.rb +0 -18
- data/lib/strategy/field/random_last_name.rb +0 -19
- data/lib/strategy/field/random_selection.rb +0 -23
- data/lib/strategy/field/user_name_template.rb +0 -22
- data/spec/strategy/field/anonymize_time_spec.rb +0 -23
- data/spec/strategy/field/gmail_template_spec.rb +0 -14
- data/spec/strategy/field/random_mailinator_email_spec.rb +0 -21
- data/spec/strategy/field/random_phone_number_spec.rb +0 -35
- data/spec/strategy/field/user_name_template_spec.rb +0 -13
data/.documentup.json
CHANGED
data/.travis.yml
CHANGED
data/README.md
CHANGED
@@ -1,16 +1,19 @@
|
|
1
1
|
# Data::Anonymization
|
2
|
-
Tool to create anonymized production data dump to use for
|
2
|
+
Tool to create anonymized production data dump to use for PERF and other TEST environments.
|
3
3
|
|
4
4
|
## Getting started
|
5
|
-
Install gem using:
|
5
|
+
Install gem using (use `pre` option to tryout edge version):
|
6
6
|
|
7
7
|
$ gem install data-anonymization
|
8
8
|
|
9
|
+
Install required database adapter library for active record:
|
10
|
+
|
11
|
+
$ gem install sqlite3
|
12
|
+
|
9
13
|
Create ruby program using data-anonymization DSL as following `my_dsl.rb`:
|
10
14
|
|
11
15
|
```ruby
|
12
16
|
require 'data-anonymization'
|
13
|
-
DF = DataAnon::Strategy::Field
|
14
17
|
|
15
18
|
database 'DatabaseName' do
|
16
19
|
strategy DataAnon::Strategy::Blacklist # whitelist (default) or blacklist
|
@@ -18,10 +21,12 @@ database 'DatabaseName' do
|
|
18
21
|
# database config as active record connection hash
|
19
22
|
source_db :adapter => 'sqlite3', :database => 'sample-data/chinook-empty.sqlite'
|
20
23
|
|
24
|
+
# User -> table name (case sensitive)
|
21
25
|
table 'User' do
|
22
|
-
|
23
|
-
|
24
|
-
anonymize
|
26
|
+
# id, DateOfBirth, FirstName, LastName, UserName, Password -> table column names (case sensitive)
|
27
|
+
primary_key 'id' # composite key is also supported
|
28
|
+
anonymize 'DateOfBirth','FirstName','LastName' # uses default anonymization based on data types
|
29
|
+
anonymize('UserName').using FieldStrategy::StringTemplate.new('user#{row_number}')
|
25
30
|
anonymize('Password') { |field| "password" }
|
26
31
|
end
|
27
32
|
|
@@ -34,23 +39,26 @@ Run using:
|
|
34
39
|
|
35
40
|
$ ruby my_dsl.rb
|
36
41
|
|
37
|
-
|
38
|
-
|
42
|
+
## Examples
|
43
|
+
|
44
|
+
1. [Whitelist](https://github.com/sunitparekh/data-anonymization/blob/master/whitelist_dsl.rb)
|
45
|
+
2. [Blacklist](https://github.com/sunitparekh/data-anonymization/blob/master/blacklist_dsl.rb)
|
39
46
|
|
40
|
-
|
47
|
+
#### Share feedback
|
48
|
+
Please use Github [issues](https://github.com/sunitparekh/data-anonymization/issues) to share feedback, feature suggestions and report issues.
|
41
49
|
|
42
50
|
## What is data anonymization?
|
43
51
|
|
44
|
-
For almost all
|
45
|
-
However, getting production data and using it is not feasible due to multiple reasons
|
52
|
+
For almost all projects there is a need to have production data dump in order to run performance tests, rehearsal production releases and debugging production issues.
|
53
|
+
However, getting production data and using it is not feasible due to multiple reasons, one of them being that personal user data would be exposed. And thus arises the need for data anonymization.
|
46
54
|
This tool helps you to get anonymized production data dump using either Blacklist or Whitelist strategies.
|
47
55
|
|
48
56
|
## Anonymization Strategies
|
49
57
|
|
50
58
|
### Blacklist
|
51
|
-
This approach
|
52
|
-
Blacklist create a copy of prod database and
|
53
|
-
|
59
|
+
This approach essentially leaves all fields unchanged with the exception of those specified by the user, which are scrambled/anonymized (hence the name blacklist).
|
60
|
+
Blacklist create a copy of prod database and chooses the fields to be anonymized like e.g. username, password, email, name, geo location etc. based on user specification Most of the fields had different rules e.g. password as always set to same value for all users, email needs to be valid.
|
61
|
+
The problem with this approach is that when new fields are added they will not be anonymized by default. Human error in omitting users personal data could be damaging.
|
54
62
|
|
55
63
|
```ruby
|
56
64
|
database 'DatabaseName' do
|
@@ -61,10 +69,11 @@ end
|
|
61
69
|
```
|
62
70
|
|
63
71
|
### Whitelist
|
64
|
-
This approach
|
65
|
-
By default all data needs to be anonymized. So from production database sanitizing the data record by record and insert anonymized data into destination database. Source database
|
66
|
-
|
67
|
-
|
72
|
+
This approach, by default scrambles/anonymizes all fields except a list of fields which are allowed to copied as is. Hence the name whitelist.
|
73
|
+
By default all data needs to be anonymized. So from production database sanitizing the data record by record and insert anonymized data into destination database. Source database need only be readonly.
|
74
|
+
All fields would be anonymized using default anonymization strategies based on the datatype, unless an anonymization strategy is specified. For instance special strategies could be used for emails, passwords, usernames etc.
|
75
|
+
A list of whitelisted fields which implies that it's okay to copy the data as is and anonymization isn't required.
|
76
|
+
This way any new field will be anonymized by default and if we need them as is, add it to the whitelist explicitly. This prevents any human error and protects sensitive information.
|
68
77
|
|
69
78
|
```ruby
|
70
79
|
database 'DatabaseName' do
|
@@ -91,70 +100,266 @@ has following attribute accessor
|
|
91
100
|
Default anonymization strategy for `string` content. Uses default 'Lorem ipsum...' text or text supplied in strategy to generate same length string.
|
92
101
|
|
93
102
|
```ruby
|
94
|
-
anonymize('UserName').using
|
103
|
+
anonymize('UserName').using FieldStrategy::LoremIpsum.new
|
95
104
|
```
|
96
105
|
```ruby
|
97
|
-
anonymize('UserName').using
|
106
|
+
anonymize('UserName').using FieldStrategy::LoremIpsum.new("very large string....")
|
98
107
|
```
|
99
108
|
```ruby
|
100
|
-
anonymize('UserName').using
|
109
|
+
anonymize('UserName').using FieldStrategy::LoremIpsum.new(File.read('my_file.txt'))
|
101
110
|
```
|
102
111
|
|
103
112
|
### RandomString
|
104
113
|
Generates random string of same length.
|
105
114
|
```ruby
|
106
|
-
anonymize('UserName').using
|
115
|
+
anonymize('UserName').using FieldStrategy::RandomString.new
|
107
116
|
```
|
108
117
|
|
109
118
|
### StringTemplate
|
110
119
|
Simple string evaluation within [DataAnon::Core::Field](#dataanon-core-field) context. Can be used for email, username anonymization.
|
111
120
|
Make sure to put the string in 'single quote' else it will get evaluated inline.
|
112
121
|
```ruby
|
113
|
-
anonymize('UserName').using
|
122
|
+
anonymize('UserName').using FieldStrategy::StringTemplate.new('user#{row_number}')
|
123
|
+
```
|
124
|
+
```ruby
|
125
|
+
anonymize('Email').using FieldStrategy::StringTemplate.new('valid.address+#{row_number}@gmail.com')
|
126
|
+
```
|
127
|
+
```ruby
|
128
|
+
anonymize('Email').using FieldStrategy::StringTemplate.new('useremail#{row_number}@mailinator.com')
|
129
|
+
```
|
130
|
+
|
131
|
+
### SelectFromList
|
132
|
+
Select randomly one of the values specified.
|
133
|
+
```ruby
|
134
|
+
anonymize('State').using FieldStrategy::SelectFromList.new(['New York','Georgia',...])
|
135
|
+
```
|
136
|
+
```ruby
|
137
|
+
anonymize('NameTitle').using FieldStrategy::SelectFromList.new(['Mr','Mrs','Dr',...])
|
138
|
+
```
|
139
|
+
|
140
|
+
### SelectFromFile
|
141
|
+
Similar to SelectFromList only difference is the list of values are picked up from file. Classical usage is like states field anonymization.
|
142
|
+
```ruby
|
143
|
+
anonymize('State').using FieldStrategy::SelectFromFile.new('states.txt')
|
144
|
+
```
|
145
|
+
|
146
|
+
### FormattedStringNumber
|
147
|
+
Keeping the format same it changes each digit in the string with random digit.
|
148
|
+
```ruby
|
149
|
+
anonymize('CreditCardNumber').using FieldStrategy::FormattedStringNumber.new
|
150
|
+
```
|
151
|
+
|
152
|
+
### SelectFromDatabase
|
153
|
+
Similar to SelectFromList with difference is the list of values are collected from the database table using distinct column query.
|
154
|
+
```ruby
|
155
|
+
# values are collected using `select distinct state from customers` query
|
156
|
+
anonymize('State').using FieldStrategy::SelectFromDatabase.new('customers','state')
|
157
|
+
```
|
158
|
+
|
159
|
+
### RandomAddress
|
160
|
+
Generates address using the [geojson](http://www.geojson.org/geojson-spec.html) format file. The default US/UK file chooses randomly from 300 addresses.
|
161
|
+
The large data set can be downloaded from [here](http://www.infochimps.com/datasets/simplegeo-places-dump)
|
162
|
+
```ruby
|
163
|
+
anonymize('Address').using FieldStrategy::RandomAddress.region_US
|
164
|
+
```
|
165
|
+
```ruby
|
166
|
+
anonymize('Address').using FieldStrategy::RandomAddress.region_UK
|
167
|
+
```
|
168
|
+
```ruby
|
169
|
+
# get your own geo_json file and use it
|
170
|
+
anonymize('Address').using FieldStrategy::RandomAddress.new('my_geo_json.json')
|
171
|
+
```
|
172
|
+
|
173
|
+
### RandomCity
|
174
|
+
Similar to RandomAddress, generates city using the [geojson](http://www.geojson.org/geojson-spec.html) format file. The default US/UK file chooses randomly from 300 addresses.
|
175
|
+
The large data set can be downloaded from [here](http://www.infochimps.com/datasets/simplegeo-places-dump)
|
176
|
+
```ruby
|
177
|
+
anonymize('City').using FieldStrategy::RandomCity.region_US
|
178
|
+
```
|
179
|
+
```ruby
|
180
|
+
anonymize('City').using FieldStrategy::RandomCity.region_UK
|
181
|
+
```
|
182
|
+
```ruby
|
183
|
+
# get your own geo_json file and use it
|
184
|
+
anonymize('City').using FieldStrategy::RandomCity.new('my_geo_json.json')
|
185
|
+
```
|
186
|
+
|
187
|
+
### RandomProvince
|
188
|
+
Similar to RandomAddress, generates province using the [geojson](http://www.geojson.org/geojson-spec.html) format file. The default US/UK file chooses randomly from 300 addresses.
|
189
|
+
The large data set can be downloaded from [here](http://www.infochimps.com/datasets/simplegeo-places-dump)
|
190
|
+
```ruby
|
191
|
+
anonymize('Province').using FieldStrategy::RandomProvince.region_US
|
192
|
+
```
|
193
|
+
```ruby
|
194
|
+
anonymize('Province').using FieldStrategy::RandomProvince.region_UK
|
195
|
+
```
|
196
|
+
```ruby
|
197
|
+
# get your own geo_json file and use it
|
198
|
+
anonymize('Province').using FieldStrategy::RandomProvince.new('my_geo_json.json')
|
199
|
+
```
|
200
|
+
|
201
|
+
### RandomZipcode
|
202
|
+
Similar to RandomAddress, generates zipcode using the [geojson](http://www.geojson.org/geojson-spec.html) format file. The default US/UK file chooses randomly from 300 addresses.
|
203
|
+
The large data set can be downloaded from [here](http://www.infochimps.com/datasets/simplegeo-places-dump)
|
204
|
+
```ruby
|
205
|
+
anonymize('Address').using FieldStrategy::RandomZipcode.region_US
|
206
|
+
```
|
207
|
+
```ruby
|
208
|
+
anonymize('Address').using FieldStrategy::RandomZipcode.region_UK
|
209
|
+
```
|
210
|
+
```ruby
|
211
|
+
# get your own geo_json file and use it
|
212
|
+
anonymize('Address').using FieldStrategy::RandomZipcode.new('my_geo_json.json')
|
213
|
+
```
|
214
|
+
|
215
|
+
### RandomPhoneNumber
|
216
|
+
Keeping the format same it changes each digit in the string with random digit.
|
217
|
+
```ruby
|
218
|
+
anonymize('PhoneNumber').using FieldStrategy::RandomPhoneNumber.new
|
219
|
+
```
|
220
|
+
|
221
|
+
### AnonymizeDateTime
|
222
|
+
Anonymizes each field(except year and seconds) within the natural range (e.g. hour between 1-24 and day within the month) based on true/false
|
223
|
+
input for that field. By default, all fields are anonymized.
|
224
|
+
```ruby
|
225
|
+
#anonymizes month and hour fields, leaving the day and minute fields untouched
|
226
|
+
anonymize('DateOfBirth').using FieldStrategy::AnonymizeDateTime.new(true,false,true,false)
|
227
|
+
```
|
228
|
+
|
229
|
+
In addition to customizing which fields you want anonymized, there are some helper methods which allow for quick anonymization
|
230
|
+
```ruby
|
231
|
+
# anonymizes only the month field
|
232
|
+
anonymize('DateOfBirth').using FieldStrategy::AnonymizeDateTime.only_month
|
233
|
+
# anonymizes only the day field
|
234
|
+
anonymize('DateOfBirth').using FieldStrategy::AnonymizeDateTime.only_day
|
235
|
+
# anonymizes only the hour field
|
236
|
+
anonymize('DateOfBirth').using FieldStrategy::AnonymizeDateTime.only_hour
|
237
|
+
# anonymizes only the minute field
|
238
|
+
anonymize('DateOfBirth').using FieldStrategy::AnonymizeDateTime.only_minute
|
114
239
|
```
|
240
|
+
|
241
|
+
### AnonymizeTime
|
242
|
+
Exactly similar to the above DateTime strategy, except that the returned object is of type `Time`
|
243
|
+
|
244
|
+
### AnonymizeDate
|
245
|
+
Anonmizes day and month fields within natural range based on true/false input for that field. By defaut both fields are
|
246
|
+
anonymized
|
115
247
|
```ruby
|
116
|
-
|
248
|
+
# anonymizes month and leaves day unchanged
|
249
|
+
anonymize('DateOfBirth').using FieldStrategy::AnonymizeDate.new(true,false)
|
117
250
|
```
|
251
|
+
|
252
|
+
In addition to customizing which fields you want anonymized, there are some helper methods which allow for quick anonymization
|
118
253
|
```ruby
|
119
|
-
|
254
|
+
# anonymizes only the month field
|
255
|
+
anonymize('DateOfBirth').using FieldStrategy::AnonymizeDate.only_month
|
256
|
+
# anonymizes only the day field
|
257
|
+
anonymize('DateOfBirth').using FieldStrategy::AnonymizeDate.only_day
|
120
258
|
```
|
121
259
|
|
122
260
|
### DateTimeDelta
|
123
261
|
Shifts data randomly within given range. Default shifts date within 10 days + or - and shifts time within 30 minutes.
|
124
262
|
```ruby
|
125
|
-
anonymize('DateOfBirth').using
|
263
|
+
anonymize('DateOfBirth').using FieldStrategy::DateTimeDelta.new
|
126
264
|
```
|
127
265
|
```ruby
|
128
266
|
# shifts date within 20 days and time within 50 minutes
|
129
|
-
anonymize('DateOfBirth').using
|
267
|
+
anonymize('DateOfBirth').using FieldStrategy::DateTimeDelta.new(20, 50)
|
268
|
+
```
|
269
|
+
|
270
|
+
### TimeDelta
|
271
|
+
Exactly similar to the above DateTime strategy, except that the returned object is of type `Time`
|
272
|
+
|
273
|
+
### DateDelta
|
274
|
+
|
275
|
+
Shifts date randomly within given delta range. Default shits date within 10 days + or -
|
276
|
+
```ruby
|
277
|
+
anonymize('DateOfBirth').using FieldStrategy::AnonymizeDate.new
|
278
|
+
```
|
279
|
+
```ruby
|
280
|
+
# shifts date within 25 days
|
281
|
+
anonymize('DateOfBirth').using FieldStrategy::DateDelta.new(25)
|
130
282
|
```
|
131
283
|
|
132
284
|
### RandomEmail
|
133
285
|
Generates email randomly using the given HOSTNAME and TLD.
|
134
286
|
By defaults generates hostname randomly along with email id.
|
135
287
|
```ruby
|
136
|
-
anonymize('
|
288
|
+
anonymize('Email').using FieldStrategy::RandomEmail.new('thoughtworks','com')
|
289
|
+
```
|
290
|
+
|
291
|
+
### GmailTemplate
|
292
|
+
Generates a valid unique gmail address by taking advantage of the gmail + strategy. Takes in a valid gmail username and
|
293
|
+
generates emails of the form username+<number>@gmail.com
|
294
|
+
```ruby
|
295
|
+
anonymize('Email').using FieldStrategy::GmailTemplate.new('username')
|
137
296
|
```
|
138
297
|
|
139
298
|
### RandomMailinatorEmail
|
140
299
|
Generates random email using mailinator hostname. e.g. <randomstring>@mailinator.com
|
141
300
|
```ruby
|
142
|
-
anonymize('
|
301
|
+
anonymize('Email').using FieldStrategy::RandomMailinatorEmail.new
|
143
302
|
```
|
144
303
|
|
145
304
|
### RandomUserName
|
305
|
+
Generates random user name of same length as original user name.
|
306
|
+
```ruby
|
307
|
+
anonymize('Username').using FieldStrategy::RandomUserName.new
|
308
|
+
```
|
309
|
+
|
146
310
|
### RandomFirstName
|
311
|
+
Randomly picks up first name from the predefined list in the file. Default [file](https://raw.github.com/sunitparekh/data-anonymization/master/resources/first_names.txt) is part of the gem.
|
312
|
+
File should contain first name on each line.
|
313
|
+
```ruby
|
314
|
+
anonymize('FirstName').using FieldStrategy::RandomFirstName.new
|
315
|
+
```
|
316
|
+
```ruby
|
317
|
+
anonymize('FirstName').using FieldStrategy::RandomFirstName.new('my_first_names.txt')
|
318
|
+
```
|
319
|
+
|
147
320
|
### RandomLastName
|
321
|
+
Randomly picks up last name from the predefined list in the file. Default [file](https://raw.github.com/sunitparekh/data-anonymization/master/resources/last_names.txt) is part of the gem.
|
322
|
+
File should contain last name on each line.
|
323
|
+
```ruby
|
324
|
+
anonymize('LastName').using FieldStrategy::RandomLastName.new
|
325
|
+
```
|
326
|
+
```ruby
|
327
|
+
anonymize('LastName').using FieldStrategy::RandomLastName.new('my_last_names.txt')
|
328
|
+
```
|
329
|
+
|
148
330
|
### RandomFullName
|
149
|
-
|
150
|
-
|
151
|
-
|
331
|
+
Generates full name using the RandomFirstName and RandomLastName strategies.
|
332
|
+
It also creates the s
|
333
|
+
```ruby
|
334
|
+
anonymize('FullName').using FieldStrategy::RandomFullName.new
|
335
|
+
```
|
336
|
+
```ruby
|
337
|
+
anonymize('FullName').using FieldStrategy::RandomLastName.new('my_first_names.txt', 'my_last_names.txt')
|
338
|
+
```
|
152
339
|
|
153
|
-
|
340
|
+
### RandomInteger
|
341
|
+
Generates random integer number between given two numbers. Default range is 0 to 100.
|
342
|
+
```ruby
|
343
|
+
anonymize('Age').using FieldStrategy::RandomInteger.new(18,70)
|
344
|
+
```
|
154
345
|
|
346
|
+
### RandomIntegerDelta
|
347
|
+
Shifts the current value randomly within given delta + and -. Default is 10
|
348
|
+
```ruby
|
349
|
+
anonymize('Age').using FieldStrategy::RandomIntegerDelta.new(2)
|
350
|
+
```
|
155
351
|
|
352
|
+
### RandomFloat
|
353
|
+
Generates random float number between given two numbers. Default range is 0.0 to 100.0
|
354
|
+
```ruby
|
355
|
+
anonymize('points').using FieldStrategy::RandomInteger.new(3.0,5.0)
|
356
|
+
```
|
156
357
|
|
157
|
-
|
358
|
+
### RandomFloatDelta
|
359
|
+
Shifts the current value randomly within given delta + and -. Default is 10.0
|
360
|
+
```ruby
|
361
|
+
anonymize('points').using FieldStrategy::RandomFloatDelta.new(2.5)
|
362
|
+
```
|
158
363
|
|
159
364
|
## Write you own field strategies
|
160
365
|
field parameter in following code is [DataAnon::Core::Field](#dataanon-core-field)
|
@@ -185,11 +390,16 @@ write your own anonymous field strategies within DSL,
|
|
185
390
|
## Default field strategies
|
186
391
|
|
187
392
|
```ruby
|
188
|
-
# Work in progress...
|
189
|
-
DEFAULT_STRATEGIES = {:string =>
|
190
|
-
:
|
191
|
-
:
|
192
|
-
:
|
393
|
+
# Work in progress... TO BE COMPLETED
|
394
|
+
DEFAULT_STRATEGIES = {:string => FieldStrategy::LoremIpsum.new,
|
395
|
+
:fixnum => FieldStrategy::RandomIntegerDelta.new(5),
|
396
|
+
:bignum => FieldStrategy::RandomIntegerDelta.new(5000),
|
397
|
+
:float => FieldStrategy::RandomFloatDelta.new(5.0),
|
398
|
+
:datetime => FieldStrategy::DateTimeDelta.new,
|
399
|
+
:time => FieldStrategy::TimeDelta.new,
|
400
|
+
:date => FieldStrategy::DateDelta.new,
|
401
|
+
:trueclass => FieldStrategy::RandomBoolean.new,
|
402
|
+
:falseclass => FieldStrategy::RandomBoolean.new
|
193
403
|
}
|
194
404
|
```
|
195
405
|
|
@@ -198,23 +408,18 @@ Overriding default field strategies,
|
|
198
408
|
```ruby
|
199
409
|
database 'Chinook' do
|
200
410
|
...
|
201
|
-
default_field_strategies :string =>
|
411
|
+
default_field_strategies :string => FieldStrategy::RandomString.new
|
202
412
|
...
|
203
413
|
end
|
204
414
|
```
|
205
415
|
|
206
|
-
## Examples
|
207
|
-
|
208
|
-
1. [Whitelist](https://github.com/sunitparekh/data-anonymization/blob/master/whitelist_dsl.rb)
|
209
|
-
2. [Blacklist](https://github.com/sunitparekh/data-anonymization/blob/master/blacklist_dsl.rb)
|
210
|
-
|
211
|
-
|
212
416
|
## Logging
|
213
417
|
|
214
|
-
|
418
|
+
How do I switch off the progress bar?
|
215
419
|
|
216
420
|
```ruby
|
217
|
-
|
421
|
+
# add following line in your ruby file
|
422
|
+
ENV['show_progress'] = 'false'
|
218
423
|
```
|
219
424
|
|
220
425
|
`Logger` provides debug level messages including database queries of active record.
|
@@ -225,15 +430,35 @@ DataAnon::Utils::Logging.logger.level = Logger::INFO
|
|
225
430
|
|
226
431
|
## Changelog
|
227
432
|
|
433
|
+
#### 0.2.0
|
434
|
+
|
435
|
+
1. Added the progress bar using 'powerbar' gem. Which also shows the ETA for each table.
|
436
|
+
2. Added More strategies
|
437
|
+
3. Fixed default anonymization strategies for boolean and integer values
|
438
|
+
4. Added support for composite primary key
|
228
439
|
|
229
|
-
|
440
|
+
#### 0.1.2 (August 14, 2012)
|
230
441
|
|
231
442
|
1. First initial release
|
232
443
|
|
233
|
-
##
|
444
|
+
## Roadmap
|
445
|
+
|
446
|
+
#### 0.2.0
|
447
|
+
|
448
|
+
1. Complete list of all the field strategies planned supporting all data types
|
449
|
+
|
450
|
+
#### 0.3.0
|
234
451
|
|
235
452
|
1. Run anonymization in parallel threads (performance enchantments)
|
236
|
-
|
453
|
+
|
454
|
+
#### 0.4.0
|
455
|
+
|
456
|
+
1. MongoDB anonymization support (NoSQL document based database support)
|
457
|
+
|
458
|
+
#### 0.5.0
|
459
|
+
|
460
|
+
1. Generate DSL from database and build schema from source as part of Whitelist approach.
|
461
|
+
|
237
462
|
|
238
463
|
## Want to contribute?
|
239
464
|
|
@@ -249,7 +474,7 @@ DataAnon::Utils::Logging.logger.level = Logger::INFO
|
|
249
474
|
|
250
475
|
## Credits
|
251
476
|
|
252
|
-
- [ThoughtWorks Inc](http://www.thoughtworks.com), for allowing us to build this tool and make
|
477
|
+
- [ThoughtWorks Inc](http://www.thoughtworks.com), for allowing us to build this tool and make it open source.
|
253
478
|
- [Birinder](https://twitter.com/birinder_) and [Panda](https://twitter.com/sarbashrestha) for reviewing the documentation.
|
254
479
|
|
255
480
|
|