fluent-plugin-anonymizer 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1,5 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
5
+ vendor/*
data/.travis.yml ADDED
@@ -0,0 +1,5 @@
1
+ language: ruby
2
+
3
+ rvm:
4
+ - 2.0.0
5
+ - 1.9.3
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in fluent-plugin-anonymizer.gemspec
4
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,14 @@
1
+ Copyright (c) 2013- Kentaro Yoshida
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
14
+
data/README.md ADDED
@@ -0,0 +1,100 @@
1
+ # fluent-plugin-anonymizer [![Build Status](https://travis-ci.org/y-ken/fluent-plugin-anonymizer.png?branch=master)](https://travis-ci.org/y-ken/fluent-plugin-anonymizer)
2
+
3
+ ## Overview
4
+
5
+ Fluentd filter output plugin to anonymize records. This data masking plugin protects privacy data such as IP address, ID, email, phone number and so on.
6
+
7
+ ## Installation
8
+
9
+ `````
10
+ ### native gem
11
+ gem install fluent-plugin-anonymizer
12
+
13
+ ### td-agent gem
14
+ /usr/lib64/fluent/ruby/bin/fluent-gem install fluent-plugin-anonymizer
15
+ `````
16
+
17
+ ## Tutorial
18
+
19
+ #### configuration
20
+
21
+ It is a sample to hash record with sha1 for `user_id`, `member_id` and `mail`. For IP address, rounding number with 24bit netmask with `ipv4_mask_keys` and `ipv4_mask_subnet` option.
22
+
23
+ `````
24
+ <source>
25
+ type forward
26
+ port 24224
27
+ </source>
28
+
29
+ <match test.message>
30
+ type anonymize
31
+ sha1_keys user_id, member_id, mail
32
+ ipv4_mask_keys host
33
+ ipv4_mask_subnet 24
34
+ remove_tag_prefix test.
35
+ add_rag_prefix anonymized.
36
+ </match>
37
+
38
+ <match anonymized.message>
39
+ type stdout
40
+ </match>
41
+ `````
42
+
43
+ #### result
44
+
45
+ `````
46
+ $ echo '{"host":"10.102.3.80","member_id":"12345", "mail":"example@example.com"}' | fluent-cat test.message
47
+
48
+ $ tail -f /var/log/td-agent/td-agent.log
49
+ 2013-11-19 18:30:21 +0900 anonymized.message: {"host":"10.102.0.0","member_id":"8cb2237d0679ca88db6464eac60da96345513964","mail":"914fec35ce8bfa1a067581032f26b053591ee38a"}
50
+ `````
51
+
52
+ ### Params
53
+
54
+ * `md5_keys` `sha1_keys` `sha256_keys` `sha384_keys` `sha512_keys`
55
+
56
+ Specify which hash algorithm to be used for following one or more keys.
57
+
58
+ * `hash_salt` (default: none)
59
+
60
+ This salt affects for `md5_keys` `sha1_keys` `sha256_keys` `sha384_keys` `sha512_keys` settings.
61
+ It is recommend to set a hash salt to prevent rainbow table attacks.
62
+
63
+
64
+ * `ipv4_mask_keys`
65
+ * `ipv4_mask_subnet` (default: 24)
66
+
67
+ Round number for following one or more keys. It makes easy to aggregate calculation.
68
+
69
+ | ipv4_mask_subnet | input | output |
70
+ |------------------|-----------------|---------------|
71
+ | 24 | 192.168.200.100 | 192.168.200.0 |
72
+ | 16 | 192.168.200.100 | 192.168.0.0 |
73
+ | 8 | 192.168.200.100 | 192.0.0.0 |
74
+
75
+ * include_tag_key (default: false)
76
+
77
+ Add original tag name into filtered record using SetTagKeyMixin function.
78
+
79
+ * remove_tag_prefix
80
+ * remove_tag_suffix
81
+ * add_tag_prefix
82
+ * add_tag_suffix
83
+
84
+ Edit tag format using HandleTagNameMixin function.
85
+
86
+ ## Blog Articles
87
+
88
+ * http://y-ken.hatenablog.com/entry/fluent-plugin-anonymizer-has-released
89
+
90
+ ## TODO
91
+
92
+ Pull requests are very welcome!!
93
+
94
+ ## Copyright
95
+
96
+ Copyright © 2013- Kentaro Yoshida ([@yoshi_ken](https://twitter.com/yoshi_ken))
97
+
98
+ ## License
99
+
100
+ Apache License, Version 2.0
data/Rakefile ADDED
@@ -0,0 +1,9 @@
1
+ require "bundler/gem_tasks"
2
+ require "rake/testtask"
3
+ Rake::TestTask.new(:test) do |test|
4
+ test.libs << 'lib' << 'test'
5
+ test.pattern = 'test/**/test_*.rb'
6
+ test.verbose = true
7
+ end
8
+
9
+ task :default => :test
@@ -0,0 +1,22 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = "fluent-plugin-anonymizer"
7
+ spec.version = "0.0.1"
8
+ spec.authors = ["Kentaro Yoshida"]
9
+ spec.email = ["y.ken.studio@gmail.com"]
10
+ spec.summary = %q{Fluentd filter output plugin to anonymize records. This data masking plugin protects privacy data such as IP address, ID, email, phone number and so on.}
11
+ spec.homepage = "https://github.com/y-ken/fluent-plugin-anonymizer"
12
+ spec.license = "Apache License, Version 2.0"
13
+
14
+ spec.files = `git ls-files`.split($/)
15
+ spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
16
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
17
+ spec.require_paths = ["lib"]
18
+
19
+ spec.add_development_dependency "bundler"
20
+ spec.add_development_dependency "rake"
21
+ spec.add_runtime_dependency "fluentd"
22
+ end
@@ -0,0 +1,79 @@
1
+ class Fluent::AnonymizerOutput < Fluent::Output
2
+ Fluent::Plugin.register_output('anonymizer', self)
3
+
4
+ HASH_ALGORITHM = %w(md5 sha1 sha256 sha384 sha512 ipv4_mask)
5
+ config_param :hash_salt, :string, :default => ''
6
+ config_param :ipv4_mask_subnet, :integer, :default => 24
7
+
8
+ include Fluent::HandleTagNameMixin
9
+
10
+ include Fluent::SetTagKeyMixin
11
+ config_set_default :include_tag_key, false
12
+
13
+ DIGEST = {
14
+ "md5" => Proc.new { Digest::MD5 },
15
+ "sha1" => Proc.new { Digest::SHA1 },
16
+ "sha256" => Proc.new { Digest::SHA256 },
17
+ "sha384" => Proc.new { Digest::SHA384 },
18
+ "sha512" => Proc.new { Digest::SHA512 }
19
+ }
20
+
21
+ def initialize
22
+ require 'digest/sha2'
23
+ require 'ipaddr'
24
+ super
25
+ end
26
+
27
+ def configure(conf)
28
+ super
29
+
30
+ @hash_keys = Hash.new
31
+ conf.keys.select{|k| k =~ /_keys$/}.each do |key|
32
+ hash_algorithm_name = key.sub('_keys','')
33
+ raise Fluent::ConfigError, "anonymizer: unsupported key #{hash_algorithm_name}" unless HASH_ALGORITHM.include?(hash_algorithm_name)
34
+ conf[key].gsub(' ', '').split(',').each do |record_key|
35
+ @hash_keys.store(record_key, hash_algorithm_name)
36
+ end
37
+ end
38
+
39
+ if @hash_keys.count < 1
40
+ raise Fluent::ConfigError, "anonymizer: missing hash keys setting."
41
+ end
42
+
43
+ if ( !@remove_tag_prefix && !@remove_tag_suffix && !@add_tag_prefix && !@add_tag_suffix )
44
+ raise Fluent::ConfigError, "anonymizer: missing remove_tag_prefix, remove_tag_suffix, add_tag_prefix or add_tag_suffix."
45
+ end
46
+ end
47
+
48
+ def emit(tag, es, chain)
49
+ es.each do |time, record|
50
+ record = filter_anonymize_record(record)
51
+ filter_record(tag, time, record)
52
+ Fluent::Engine.emit(tag, time, record)
53
+ end
54
+ chain.next
55
+ end
56
+
57
+ def filter_anonymize_record(record)
58
+ @hash_keys.each do |hash_key, hash_algorithm|
59
+ next unless record.include?(hash_key)
60
+ if record[hash_key].is_a?(Array)
61
+ record[hash_key] = record[hash_key].collect { |v| anonymize(v, hash_algorithm, @hash_salt) }
62
+ else
63
+ record[hash_key] = anonymize(record[hash_key], hash_algorithm, @hash_salt)
64
+ end
65
+ end
66
+ return record
67
+ end
68
+
69
+ def anonymize(message, algorithm, salt)
70
+ case algorithm
71
+ when 'md5','sha1','sha256','sha384','sha512'
72
+ DIGEST[algorithm].call.hexdigest(salt + message.to_s)
73
+ when 'ipv4_mask'
74
+ IPAddr.new(message).mask(@ipv4_mask_subnet).to_s
75
+ else
76
+ $log.warn "anonymizer: unknown algorithm #{algorithm} has called."
77
+ end
78
+ end
79
+ end
data/test/helper.rb ADDED
@@ -0,0 +1,28 @@
1
+ require 'rubygems'
2
+ require 'bundler'
3
+ begin
4
+ Bundler.setup(:default, :development)
5
+ rescue Bundler::BundlerError => e
6
+ $stderr.puts e.message
7
+ $stderr.puts "Run `bundle install` to install missing gems"
8
+ exit e.status_code
9
+ end
10
+ require 'test/unit'
11
+
12
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
13
+ $LOAD_PATH.unshift(File.dirname(__FILE__))
14
+ require 'fluent/test'
15
+ unless ENV.has_key?('VERBOSE')
16
+ nulllogger = Object.new
17
+ nulllogger.instance_eval {|obj|
18
+ def method_missing(method, *args)
19
+ # pass
20
+ end
21
+ }
22
+ $log = nulllogger
23
+ end
24
+
25
+ require 'fluent/plugin/out_anonymizer'
26
+
27
+ class Test::Unit::TestCase
28
+ end
@@ -0,0 +1,90 @@
1
+ require 'helper'
2
+
3
+ class AnonymizerOutputTest < Test::Unit::TestCase
4
+ def setup
5
+ Fluent::Test.setup
6
+ end
7
+
8
+ CONFIG = %[
9
+ md5_keys data_for_md5
10
+ sha1_keys data_for_sha1
11
+ sha256_keys data_for_sha256
12
+ sha384_keys data_for_sha384
13
+ sha512_keys data_for_sha512
14
+ hash_salt test_salt_string
15
+ ipv4_mask_keys host
16
+ ipv4_mask_subnet 24
17
+ remove_tag_prefix input.
18
+ add_tag_prefix anonymized.
19
+ ]
20
+
21
+ CONFIG_MULTI_KEYS = %[
22
+ sha1_keys member_id, mail, telephone
23
+ ipv4_mask_keys host
24
+ ipv4_mask_subnet 16
25
+ remove_tag_prefix input.
26
+ add_tag_prefix anonymized.
27
+ ]
28
+
29
+ def create_driver(conf=CONFIG,tag='test')
30
+ Fluent::Test::OutputTestDriver.new(Fluent::AnonymizerOutput, tag).configure(conf)
31
+ end
32
+
33
+ def test_configure
34
+ assert_raise(Fluent::ConfigError) {
35
+ d = create_driver('')
36
+ }
37
+ assert_raise(Fluent::ConfigError) {
38
+ d = create_driver('unknown_keys')
39
+ }
40
+ d = create_driver(CONFIG)
41
+ puts d.instance.inspect
42
+ assert_equal 'test_salt_string', d.instance.config['hash_salt']
43
+ end
44
+
45
+ def test_emit
46
+ d1 = create_driver(CONFIG, 'input.access')
47
+ d1.run do
48
+ d1.emit({
49
+ 'host' => '10.102.3.80',
50
+ 'data_for_md5' => '12345',
51
+ 'data_for_sha1' => '12345',
52
+ 'data_for_sha256' => '12345',
53
+ 'data_for_sha384' => '12345',
54
+ 'data_for_sha512' => '12345'
55
+ })
56
+ end
57
+ emits = d1.emits
58
+ assert_equal 1, emits.length
59
+ p emits[0]
60
+ assert_equal 'anonymized.access', emits[0][0] # tag
61
+ assert_equal '10.102.3.0', emits[0][2]['host']
62
+ assert_equal '9138bd41172f5485f7b6eee3afcd0d62', emits[0][2]['data_for_md5']
63
+ assert_equal 'ee98db51658d38580b1cf788db19ad06e51a32f7', emits[0][2]['data_for_sha1']
64
+ assert_equal 'd53d15615b19597b0f95a984a132ed5164ba9676bf3cb28e018d28feaa2ea6fd', emits[0][2]['data_for_sha256']
65
+ assert_equal '6e9cd6d84ea371a72148b418f1a8cb2534da114bc2186d36ec6f14fd5c237b6f2e460f409dda89b7e42a14b7da8a8131', emits[0][2]['data_for_sha384']
66
+ assert_equal 'adcf4e5d1e52f57f67d8b0cd85051158d7362103d7ed4cb6302445c2708eff4b17cb309cf5d09fd5cf76615c75652bd29d1707ce689a28e8700afd7a7439ef20', emits[0][2]['data_for_sha512']
67
+ end
68
+
69
+ def test_emit_multi_keys
70
+ d1 = create_driver(CONFIG_MULTI_KEYS, 'input.access')
71
+ d1.run do
72
+ d1.emit({
73
+ 'host' => '10.102.3.80',
74
+ 'member_id' => '12345',
75
+ 'mail' => 'example@example.com',
76
+ 'telephone' => '00-0000-0000',
77
+ 'action' => 'signup'
78
+ })
79
+ end
80
+ emits = d1.emits
81
+ assert_equal 1, emits.length
82
+ p emits[0]
83
+ assert_equal 'anonymized.access', emits[0][0] # tag
84
+ assert_equal '10.102.0.0', emits[0][2]['host']
85
+ assert_equal '8cb2237d0679ca88db6464eac60da96345513964', emits[0][2]['member_id']
86
+ assert_equal '914fec35ce8bfa1a067581032f26b053591ee38a', emits[0][2]['mail']
87
+ assert_equal 'ce164718b94212332187eb8420903b46b334d609', emits[0][2]['telephone']
88
+ assert_equal 'signup', emits[0][2]['action']
89
+ end
90
+ end
metadata ADDED
@@ -0,0 +1,107 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: fluent-plugin-anonymizer
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Kentaro Yoshida
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2013-11-19 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: bundler
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :development
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ! '>='
28
+ - !ruby/object:Gem::Version
29
+ version: '0'
30
+ - !ruby/object:Gem::Dependency
31
+ name: rake
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ! '>='
36
+ - !ruby/object:Gem::Version
37
+ version: '0'
38
+ type: :development
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ! '>='
44
+ - !ruby/object:Gem::Version
45
+ version: '0'
46
+ - !ruby/object:Gem::Dependency
47
+ name: fluentd
48
+ requirement: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ! '>='
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ type: :runtime
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ! '>='
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ description:
63
+ email:
64
+ - y.ken.studio@gmail.com
65
+ executables: []
66
+ extensions: []
67
+ extra_rdoc_files: []
68
+ files:
69
+ - .gitignore
70
+ - .travis.yml
71
+ - Gemfile
72
+ - LICENSE
73
+ - README.md
74
+ - Rakefile
75
+ - fluent-plugin-anonymizer.gemspec
76
+ - lib/fluent/plugin/out_anonymizer.rb
77
+ - test/helper.rb
78
+ - test/plugin/test_out_anonymizer.rb
79
+ homepage: https://github.com/y-ken/fluent-plugin-anonymizer
80
+ licenses:
81
+ - Apache License, Version 2.0
82
+ post_install_message:
83
+ rdoc_options: []
84
+ require_paths:
85
+ - lib
86
+ required_ruby_version: !ruby/object:Gem::Requirement
87
+ none: false
88
+ requirements:
89
+ - - ! '>='
90
+ - !ruby/object:Gem::Version
91
+ version: '0'
92
+ required_rubygems_version: !ruby/object:Gem::Requirement
93
+ none: false
94
+ requirements:
95
+ - - ! '>='
96
+ - !ruby/object:Gem::Version
97
+ version: '0'
98
+ requirements: []
99
+ rubyforge_project:
100
+ rubygems_version: 1.8.23
101
+ signing_key:
102
+ specification_version: 3
103
+ summary: Fluentd filter output plugin to anonymize records. This data masking plugin
104
+ protects privacy data such as IP address, ID, email, phone number and so on.
105
+ test_files:
106
+ - test/helper.rb
107
+ - test/plugin/test_out_anonymizer.rb