fluent-plugin-anonymizer 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/.travis.yml CHANGED
@@ -1,5 +1,6 @@
1
1
  language: ruby
2
2
 
3
3
  rvm:
4
+ - 2.1.0
4
5
  - 2.0.0
5
6
  - 1.9.3
data/README.md CHANGED
@@ -2,10 +2,12 @@
2
2
 
3
3
  ## Overview
4
4
 
5
- Fluentd filter output plugin to anonymize records with MD5/SHA1/SHA256/SHA384/SHA512 algorithms. This data masking plugin protects privacy data such as ID, email, phone number, IPv4/IPv6 address and so on.
5
+ Fluentd filter output plugin to anonymize records with [OpenSSL::HMAC](http://docs.ruby-lang.org/ja/1.9.3/class/OpenSSL=3a=3aHMAC.html) of MD5/SHA1/SHA256/SHA384/SHA512 algorithms. This data masking plugin protects privacy data such as UserID, Email, Phone number, IPv4/IPv6 address and so on.
6
6
 
7
7
  ## Installation
8
8
 
9
+ install with gem or fluent-gem command as:
10
+
9
11
  `````
10
12
  ### native gem
11
13
  gem install fluent-plugin-anonymizer
@@ -28,10 +30,19 @@ It is a sample to hash record with sha1 for `user_id`, `member_id` and `mail`. F
28
30
 
29
31
  <match test.message>
30
32
  type anonymizer
33
+
34
+ # Specify hashing keys with comma
31
35
  sha1_keys user_id, member_id, mail
36
+
37
+ # Set hash salt with any strings for more security
38
+ hash_salt mysaltstring
39
+
40
+ # Specify rounding address keys with comma and subnet mask
32
41
  ipaddr_mask_keys host
33
42
  ipv4_mask_subnet 24
34
43
  ipv6_mask_subnet 104
44
+
45
+ # Set tag rename pattern
35
46
  remove_tag_prefix test.
36
47
  add_tag_prefix anonymized.
37
48
  </match>
@@ -48,8 +59,8 @@ $ echo '{"host":"10.102.3.80","member_id":"12345", "mail":"example@example.com"}
48
59
  $ echo '{"host":"2001:db8:0:8d3:0:8a2e:70:7344","member_id":"12345", "mail":"example@example.com"}' | fluent-cat test.message
49
60
 
50
61
  $ tail -f /var/log/td-agent/td-agent.log
51
- 2014-01-06 18:30:21 +0900 anonymized.message: {"host":"10.102.3.0","member_id":"8cb2237d0679ca88db6464eac60da96345513964","mail":"914fec35ce8bfa1a067581032f26b053591ee38a"}
52
- 2014-01-06 18:30:22 +0900 anonymized.message: {"host":"2001:db8:0:8d3:0:8a2e::","member_id":"8cb2237d0679ca88db6464eac60da96345513964","mail":"914fec35ce8bfa1a067581032f26b053591ee38a"}
62
+ 2014-01-06 18:30:21 +0900 anonymized.message: {"host":"10.102.3.0","member_id":"61f6c1b5f19e0a7f73dd52a23534085bf01f2c67","mail":"eeb890d74b8c1c4cd1e35a3ea62166e0b770f4f4"}
63
+ 2014-01-06 18:30:22 +0900 anonymized.message: {"host":"2001:db8:0:8d3:0:8a2e::","member_id":"61f6c1b5f19e0a7f73dd52a23534085bf01f2c67","mail":"eeb890d74b8c1c4cd1e35a3ea62166e0b770f4f4"}
53
64
  `````
54
65
 
55
66
  ## Parameters
@@ -88,13 +99,18 @@ Add original tag name into filtered record using SetTagKeyMixin.
88
99
 
89
100
  set one or more option are required for editing tag name using HandleTagNameMixin.
90
101
 
102
+ * tag
103
+
104
+ On using this option [like 'tag anonymized.${tag}' with tag placeholder](https://github.com/y-ken/fluent-plugin-anonymizer/blob/master/test/plugin/test_out_anonymizer.rb#L153), it will be overwrite after these options affected. which are remove_tag_prefix, remove_tag_suffix, add_tag_prefix and add_tag_suffix.
105
+
91
106
  ## Notes
92
107
 
93
- * hashing nested value behavior is compatible with [LogStash::Filters::Anonymize](https://github.com/logstash/logstash/blob/master/lib/logstash/filters/anonymize.rb) does. For further details, please check it out the test code at [test_emit_nest_value](https://github.com/y-ken/fluent-plugin-anonymizer/blob/master/test/plugin/test_out_anonymizer.rb#L98).
108
+ * hashing nested value behavior is compatible with [LogStash::Filters::Anonymize](https://github.com/logstash/logstash/blob/master/lib/logstash/filters/anonymize.rb) does. For further details, please check it out the test code at [test_emit_nest_value](https://github.com/y-ken/fluent-plugin-anonymizer/blob/master/test/plugin/test_out_anonymizer.rb#L91).
94
109
 
95
110
  ## Blog Articles
96
111
 
97
- * http://y-ken.hatenablog.com/entry/fluent-plugin-anonymizer-has-released
112
+ * 個人情報を難読化するfluent-plugin-anonymizerをリリースしました #fluentd - Y-Ken Studio
113
+ http://y-ken.hatenablog.com/entry/fluent-plugin-anonymizer-has-released
98
114
 
99
115
  ## TODO
100
116
 
@@ -4,10 +4,10 @@ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
4
 
5
5
  Gem::Specification.new do |spec|
6
6
  spec.name = "fluent-plugin-anonymizer"
7
- spec.version = "0.1.0"
7
+ spec.version = "0.2.0"
8
8
  spec.authors = ["Kentaro Yoshida"]
9
9
  spec.email = ["y.ken.studio@gmail.com"]
10
- spec.summary = %q{Fluentd filter output plugin to anonymize records with MD5/SHA1/SHA256/SHA384/SHA512 algorithms. This data masking plugin protects privacy data such as ID, email, phone number, IPv4/IPv6 address and so on.}
10
+ spec.summary = %q{Fluentd filter output plugin to anonymize records with HMAC of MD5/SHA1/SHA256/SHA384/SHA512 algorithms. This data masking plugin protects privacy data such as UserID, Email, Phone number, IPv4/IPv6 address and so on.}
11
11
  spec.homepage = "https://github.com/y-ken/fluent-plugin-anonymizer"
12
12
  spec.license = "Apache License, Version 2.0"
13
13
 
@@ -2,6 +2,7 @@ class Fluent::AnonymizerOutput < Fluent::Output
2
2
  Fluent::Plugin.register_output('anonymizer', self)
3
3
 
4
4
  HASH_ALGORITHM = %w(md5 sha1 sha256 sha384 sha512 ipaddr_mask)
5
+ config_param :tag, :string, :default => nil
5
6
  config_param :hash_salt, :string, :default => ''
6
7
  config_param :ipv4_mask_subnet, :integer, :default => 24
7
8
  config_param :ipv6_mask_subnet, :integer, :default => 104
@@ -12,15 +13,15 @@ class Fluent::AnonymizerOutput < Fluent::Output
12
13
  config_set_default :include_tag_key, false
13
14
 
14
15
  DIGEST = {
15
- "md5" => Proc.new { Digest::MD5 },
16
- "sha1" => Proc.new { Digest::SHA1 },
17
- "sha256" => Proc.new { Digest::SHA256 },
18
- "sha384" => Proc.new { Digest::SHA384 },
19
- "sha512" => Proc.new { Digest::SHA512 }
16
+ "md5" => Proc.new { OpenSSL::Digest.new('md5') },
17
+ "sha1" => Proc.new { OpenSSL::Digest.new('sha1') },
18
+ "sha256" => Proc.new { OpenSSL::Digest.new('sha256') },
19
+ "sha384" => Proc.new { OpenSSL::Digest.new('sha384') },
20
+ "sha512" => Proc.new { OpenSSL::Digest.new('sha512') }
20
21
  }
21
22
 
22
23
  def initialize
23
- require 'digest/sha2'
24
+ require 'openssl'
24
25
  require 'ipaddr'
25
26
  super
26
27
  end
@@ -42,7 +43,7 @@ class Fluent::AnonymizerOutput < Fluent::Output
42
43
  end
43
44
  $log.info "anonymizer: adding anonymize rules for each field. #{@hash_keys}"
44
45
 
45
- if ( !@remove_tag_prefix && !@remove_tag_suffix && !@add_tag_prefix && !@add_tag_suffix )
46
+ if ( !@tag && !@remove_tag_prefix && !@remove_tag_suffix && !@add_tag_prefix && !@add_tag_suffix )
46
47
  raise Fluent::ConfigError, "anonymizer: missing remove_tag_prefix, remove_tag_suffix, add_tag_prefix or add_tag_suffix."
47
48
  end
48
49
  end
@@ -53,13 +54,25 @@ class Fluent::AnonymizerOutput < Fluent::Output
53
54
  next unless record.include?(hash_key)
54
55
  record[hash_key] = filter_anonymize_record(record[hash_key], hash_algorithm)
55
56
  end
56
- t = tag.dup
57
- filter_record(t, time, record)
58
- Fluent::Engine.emit(t, time, record)
57
+ emit_tag = tag.dup
58
+ filter_record(emit_tag, time, record)
59
+ emit_tag = rewrite_tag(@tag, emit_tag) if @tag
60
+ Fluent::Engine.emit(emit_tag, time, record)
59
61
  end
60
62
  chain.next
61
63
  end
62
64
 
65
+ def rewrite_tag(rewritetag, tag)
66
+ placeholder = {
67
+ '${tag}' => tag,
68
+ '__TAG__' => tag
69
+ }
70
+ return rewritetag.gsub(/(\${[a-z_]+(\[[0-9]+\])?}|__[A-Z_]+__)/) do
71
+ $log.warn "anonymizer: unknown placeholder found. :placeholder=>#{$1} :tag=>#{tag} :rewritetag=>#{rewritetag}" unless placeholder.include?($1)
72
+ placeholder[$1]
73
+ end
74
+ end
75
+
63
76
  def filter_anonymize_record(data, hash_algorithm)
64
77
  begin
65
78
  if data.is_a?(Array)
@@ -77,7 +90,7 @@ class Fluent::AnonymizerOutput < Fluent::Output
77
90
  def anonymize(message, algorithm, salt)
78
91
  case algorithm
79
92
  when 'md5','sha1','sha256','sha384','sha512'
80
- DIGEST[algorithm].call.hexdigest(salt + message.to_s)
93
+ OpenSSL::HMAC.hexdigest(DIGEST[algorithm].call, salt, message.to_s)
81
94
  when 'ipaddr_mask'
82
95
  address = IPAddr.new(message)
83
96
  subnet = address.ipv4? ? @ipv4_mask_subnet : @ipv6_mask_subnet
@@ -18,29 +18,6 @@ class AnonymizerOutputTest < Test::Unit::TestCase
18
18
  add_tag_prefix anonymized.
19
19
  ]
20
20
 
21
- CONFIG_MULTI_KEYS = %[
22
- sha1_keys member_id, mail, telephone
23
- ipaddr_mask_keys host
24
- ipv4_mask_subnet 16
25
- remove_tag_prefix input.
26
- add_tag_prefix anonymized.
27
- ]
28
-
29
- CONFIG_NEST_VALUE = %[
30
- sha1_keys array,hash
31
- ipaddr_mask_keys host
32
- remove_tag_prefix input.
33
- add_tag_prefix anonymized.
34
- ]
35
-
36
- CONFIG_IPV6 = %[
37
- ipaddr_mask_keys host
38
- ipv4_mask_subnet 24
39
- ipv6_mask_subnet 104
40
- remove_tag_prefix input.
41
- add_tag_prefix anonymized.
42
- ]
43
-
44
21
  def create_driver(conf=CONFIG,tag='test')
45
22
  Fluent::Test::OutputTestDriver.new(Fluent::AnonymizerOutput, tag).configure(conf)
46
23
  end
@@ -74,18 +51,25 @@ class AnonymizerOutputTest < Test::Unit::TestCase
74
51
  p emits[0]
75
52
  assert_equal 'anonymized.access', emits[0][0] # tag
76
53
  assert_equal '10.102.3.0', emits[0][2]['host']
77
- assert_equal '9138bd41172f5485f7b6eee3afcd0d62', emits[0][2]['data_for_md5']
78
- assert_equal 'ee98db51658d38580b1cf788db19ad06e51a32f7', emits[0][2]['data_for_sha1']
79
- assert_equal 'd53d15615b19597b0f95a984a132ed5164ba9676bf3cb28e018d28feaa2ea6fd', emits[0][2]['data_for_sha256']
80
- assert_equal '6e9cd6d84ea371a72148b418f1a8cb2534da114bc2186d36ec6f14fd5c237b6f2e460f409dda89b7e42a14b7da8a8131', emits[0][2]['data_for_sha384']
81
- assert_equal 'adcf4e5d1e52f57f67d8b0cd85051158d7362103d7ed4cb6302445c2708eff4b17cb309cf5d09fd5cf76615c75652bd29d1707ce689a28e8700afd7a7439ef20', emits[0][2]['data_for_sha512']
54
+ assert_equal 'e738cbde82a514dc60582cd467c240ed', emits[0][2]['data_for_md5']
55
+ assert_equal '69cf099459c06b852ede96d39b710027727d13c6', emits[0][2]['data_for_sha1']
56
+ assert_equal '804d83b8c6a3e01498d40677652b084333196d8e548ee5a8710fbd0e1e115527', emits[0][2]['data_for_sha256']
57
+ assert_equal '6c90c389bbdfc210416b9318df3f526b4f218f8a8df3a67020353c35da22dc154460b18f22a8009a747b3ef2975acae7', emits[0][2]['data_for_sha384']
58
+ assert_equal 'cdbb897e6f3a092161bdb51164eb2996b75b00555f568219628ff15cd2929865d217af5dff9c32ddc908b75a89baec96b3e9a0da120e919f5246de0f1bc54c58', emits[0][2]['data_for_sha512']
82
59
  end
83
60
 
84
61
  def test_emit_multi_keys
85
- d1 = create_driver(CONFIG_MULTI_KEYS, 'input.access')
62
+ d1 = create_driver(%[
63
+ sha1_keys member_id, mail, telephone
64
+ ipaddr_mask_keys host, host2
65
+ ipv4_mask_subnet 16
66
+ remove_tag_prefix input.
67
+ add_tag_prefix anonymized.
68
+ ], 'input.access')
86
69
  d1.run do
87
70
  d1.emit({
88
71
  'host' => '10.102.3.80',
72
+ 'host2' => '10.102.3.80',
89
73
  'member_id' => '12345',
90
74
  'mail' => 'example@example.com',
91
75
  'telephone' => '00-0000-0000',
@@ -97,14 +81,20 @@ class AnonymizerOutputTest < Test::Unit::TestCase
97
81
  p emits[0]
98
82
  assert_equal 'anonymized.access', emits[0][0] # tag
99
83
  assert_equal '10.102.0.0', emits[0][2]['host']
100
- assert_equal '8cb2237d0679ca88db6464eac60da96345513964', emits[0][2]['member_id']
101
- assert_equal '914fec35ce8bfa1a067581032f26b053591ee38a', emits[0][2]['mail']
102
- assert_equal 'ce164718b94212332187eb8420903b46b334d609', emits[0][2]['telephone']
84
+ assert_equal '10.102.0.0', emits[0][2]['host2']
85
+ assert_equal '774472f0dc892f0b3299cae8dadacd0a74ba59d7', emits[0][2]['member_id']
86
+ assert_equal 'd7b728209f5dd8df10cecbced30394c3c7fc2c82', emits[0][2]['mail']
87
+ assert_equal 'a67f73c395105a358a03a0f127bf64b5495e7841', emits[0][2]['telephone']
103
88
  assert_equal 'signup', emits[0][2]['action']
104
89
  end
105
90
 
106
91
  def test_emit_nest_value
107
- d1 = create_driver(CONFIG_NEST_VALUE, 'input.access')
92
+ d1 = create_driver(%[
93
+ sha1_keys array,hash
94
+ ipaddr_mask_keys host
95
+ remove_tag_prefix input.
96
+ add_tag_prefix anonymized.
97
+ ], 'input.access')
108
98
  d1.run do
109
99
  d1.emit({
110
100
  'host' => '10.102.3.80',
@@ -117,12 +107,18 @@ class AnonymizerOutputTest < Test::Unit::TestCase
117
107
  p emits[0]
118
108
  assert_equal 'anonymized.access', emits[0][0] # tag
119
109
  assert_equal '10.102.3.0', emits[0][2]['host']
120
- assert_equal ["e3cbba8883fe746c6e35783c9404b4bc0c7ee9eb", "a4ac914c09d7c097fe1f4f96b897e625b6922069"], emits[0][2]['array']
121
- assert_equal '1a1903d78aed9403649d61cb21ba6b489249761b', emits[0][2]['hash']
110
+ assert_equal ["c1628fc0d473cb21b15607c10bdcad19d1a42e24", "ea87abc249f9f2d430edb816514bffeffd3e698e"], emits[0][2]['array']
111
+ assert_equal '28fe85deb0d1d39ee14c49c62bc4773b0338247b', emits[0][2]['hash']
122
112
  end
123
113
 
124
114
  def test_emit_ipv6
125
- d1 = create_driver(CONFIG_IPV6, 'input.access')
115
+ d1 = create_driver(%[
116
+ ipaddr_mask_keys host
117
+ ipv4_mask_subnet 24
118
+ ipv6_mask_subnet 104
119
+ remove_tag_prefix input.
120
+ add_tag_prefix anonymized.
121
+ ], 'input.access')
126
122
  d1.run do
127
123
  d1.emit({'host' => '10.102.3.80'})
128
124
  d1.emit({'host' => '0:0:0:0:0:FFFF:129.144.52.38'})
@@ -136,4 +132,39 @@ class AnonymizerOutputTest < Test::Unit::TestCase
136
132
  assert_equal '::ffff:129.0.0.0', emits[1][2]['host']
137
133
  assert_equal '2001:db8:0:8d3:0:8a2e::', emits[2][2]['host']
138
134
  end
135
+
136
+ def test_emit_tag_static
137
+ d1 = create_driver(%[
138
+ sha1_keys member_id
139
+ tag anonymized.message
140
+ ], 'input.access')
141
+ d1.run do
142
+ d1.emit({
143
+ 'member_id' => '12345',
144
+ })
145
+ end
146
+ emits = d1.emits
147
+ assert_equal 1, emits.length
148
+ p emits[0]
149
+ assert_equal 'anonymized.message', emits[0][0] # tag
150
+ assert_equal '774472f0dc892f0b3299cae8dadacd0a74ba59d7', emits[0][2]['member_id']
151
+ end
152
+
153
+ def test_emit_tag_placeholder
154
+ d1 = create_driver(%[
155
+ sha1_keys member_id
156
+ tag anonymized.${tag}
157
+ remove_tag_prefix input.
158
+ ], 'input.access')
159
+ d1.run do
160
+ d1.emit({
161
+ 'member_id' => '12345',
162
+ })
163
+ end
164
+ emits = d1.emits
165
+ assert_equal 1, emits.length
166
+ p emits[0]
167
+ assert_equal 'anonymized.access', emits[0][0] # tag
168
+ assert_equal '774472f0dc892f0b3299cae8dadacd0a74ba59d7', emits[0][2]['member_id']
169
+ end
139
170
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fluent-plugin-anonymizer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2014-01-07 00:00:00.000000000 Z
12
+ date: 2014-01-20 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: bundler
@@ -100,9 +100,9 @@ rubyforge_project:
100
100
  rubygems_version: 1.8.23
101
101
  signing_key:
102
102
  specification_version: 3
103
- summary: Fluentd filter output plugin to anonymize records with MD5/SHA1/SHA256/SHA384/SHA512
104
- algorithms. This data masking plugin protects privacy data such as ID, email, phone
105
- number, IPv4/IPv6 address and so on.
103
+ summary: Fluentd filter output plugin to anonymize records with HMAC of MD5/SHA1/SHA256/SHA384/SHA512
104
+ algorithms. This data masking plugin protects privacy data such as UserID, Email,
105
+ Phone number, IPv4/IPv6 address and so on.
106
106
  test_files:
107
107
  - test/helper.rb
108
108
  - test/plugin/test_out_anonymizer.rb