fluent-plugin-woothee 0.2.2 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 6adc6d94e1e14d85619c6f5f320ff6c9e1b7ec77
4
- data.tar.gz: 76a0c6592d1e33e6625bd8cb58839d64d1f7344d
3
+ metadata.gz: cf79bb6e1d07bdddb4cae76b7ae7674fbe7b460f
4
+ data.tar.gz: 6d1c7933f26b6f2b7bb7df6aacb11f645ca6f649
5
5
  SHA512:
6
- metadata.gz: 0446a8bc15e2e7cca56d407c0085574fd821ae24dfec837b0526b509e50f5c6baa18cf5b80c18dc595b4abc5775e76f8fbb6ec6ecf893c92c1fe0a7df25c4603
7
- data.tar.gz: 06e8582442bc4409103f92e3b40eadd4b41e0d9cf51731ebdb96e95d09ca4cd5204f6807ed8148d3daeb2a0b77403cd167d73ef8ec565485d8575018905fe49f
6
+ metadata.gz: 2b93cc550a0584bd74421878da8efd1da5d97c7e8f7fe09f5e98bf5cabd0b4d30c283220a5503fb5298304358a5a6986b592039394f4e06bdfef0b446f0d7034
7
+ data.tar.gz: e0444b74e92eb0bdf2cb6968b82d9998bed32b29ec025aa8a6c6272a62185e050f2b44610e9cb00a285144dc2185ad15867157c6644d7e02cb4b14c5397dbd41
@@ -1,7 +1,6 @@
1
1
  language: ruby
2
2
  sudo: false
3
3
  rvm:
4
- - 2.0.0
5
4
  - 2.1
6
5
  - 2.2
7
6
  - 2.3.0
data/README.md CHANGED
@@ -1,133 +1,84 @@
1
1
  # fluent-plugin-woothee
2
2
 
3
- ## WootheeOutput
4
-
5
- 'fluent-plugin-woothee' is a Fluentd plugin to parse UserAgent strings and to filter/drop specified categories of user terminals (like 'pc', 'smartphone' and so on).
3
+ 'fluent-plugin-woothee' is a Fluentd filter plugin to parse UserAgent strings and to filter/drop specified categories of user terminals (like 'pc', 'smartphone' and so on).
6
4
 
7
5
  'woothee' is multi-language user-agent strings parser project. See: https://github.com/woothee/woothee
8
6
 
9
7
  ## Configuration
10
8
 
11
- To add woothee parser result into matched messages:
12
-
13
- <match input.**>
14
- type woothee
15
- key_name agent
16
- remove_prefix input
17
- add_prefix merged
18
- merge_agent_info yes
19
- </match>
20
-
21
- Output messages with tag 'merged.**' has attributes like 'agent\_name', 'agent\_category' and 'agent\_os' from woothee parser result. If you want to change attribute names, or want to merge more attributes of browser vendor and its version, write configurations as below:
22
-
23
- <match input.**>
24
- type woothee
25
- key_name agent
26
- remove_prefix input
27
- add_prefix merged
28
- merge_agent_info yes
29
- out_key_name ua_name
30
- out_key_category ua_category
31
- out_key_os ua_os
32
- out_key_os_version ua_os_version
33
- out_key_version ua_version
34
- out_key_vendor ua_vendor
35
- </match>
36
-
37
- To re-emit messages with specified user-agent categories (and merge woothee parser result), configure like this:
38
-
39
- <match input.**>
40
- type woothee
41
- key_name agent
42
- filter_categories pc,smartphone,mobilephone,appliance
43
- remove_prefix input
44
- add_prefix merged
45
- merge_agent_info yes
46
- </match>
47
-
48
- Or, you can specify categories to drop (and not to merge woothee result):
49
-
50
- <match input.**>
51
- type woothee
52
- key_name agent
53
- drop_categories crawler
54
- remove_prefix input
55
- add_prefix merged
56
- merge_agent_info false # default
57
- </match>
58
-
59
- ### Fast Crawler Filter
60
-
61
- If you want to drop __almost__ all of messages with crawler's user-agent, and not to merge woothee result, you just specify plugin type:
62
-
63
- <match input.**>
64
- type woothee_fast_crawler_filter
65
- key_name useragent
66
- tag filtered
67
- </match>
68
-
69
- 'fluent-plugin-woothee' uses 'Woothee.is_crawler' of woothee with this configuration, fast and incomplete method to judge user-agent is crawler or not.
70
- If you want to drop all of crawlers completely, specify 'type woothee' and 'drop_categories crawler'.
71
-
72
- ## WootheeFilter
73
-
74
- This is filter version of 'fluent-plugin-woothee'.
75
- Note that this filter version does not have rewrite tag functionality.
76
-
77
- ## Configuration
78
-
79
- To add woothee parser result into filtered messages:
80
-
81
- <filter input.**>
82
- type woothee
83
- key_name agent
84
- merge_agent_info yes
85
- </filter>
86
-
87
- Filtered messages with non-modified tag has attributes like 'agent\_name', 'agent\_category' and 'agent\_os' from woothee parser result. If you want to change attribute names, or want to merge more attributes of browser vendor and its version, write configurations as below:
88
-
89
- <filter input.**>
90
- type woothee
91
- key_name agent
92
- merge_agent_info yes
93
- out_key_name ua_name
94
- out_key_category ua_category
95
- out_key_os ua_os
96
- out_key_os_version ua_os_version
97
- out_key_version ua_version
98
- out_key_vendor ua_vendor
99
- </filter>
100
-
101
- To filter messages with specified user-agent categories (and merge woothee parser result), configure like this:
102
-
103
- <filter input.**>
104
- type woothee
105
- key_name agent
106
- filter_categories pc,smartphone,mobilephone,appliance
107
- merge_agent_info yes
108
- </filter>
9
+ To add woothee parser result into messages:
10
+
11
+ <label @accesslog>
12
+ <filter input.**>
13
+ @type woothee
14
+ key_name agent
15
+ merge_agent_info yes
16
+ </filter>
17
+ <match ...>
18
+ </match>
19
+ </label>
20
+
21
+ Result messages has attributes like 'agent\_name', 'agent\_category' and 'agent\_os' from woothee parser result. If you want to change attribute names, or want to merge more attributes of browser vendor and its version, write configurations as below:
22
+
23
+ <label @accesslog>
24
+ <filter input.**>
25
+ @type woothee
26
+ key_name agent
27
+ merge_agent_info yes
28
+
29
+ out_key_name ua_name
30
+ out_key_category ua_category
31
+ out_key_os ua_os
32
+ out_key_os_version ua_os_version
33
+ out_key_version ua_version
34
+ out_key_vendor ua_vendor
35
+ </filter>
36
+ <match ...>
37
+ </match>
38
+ </label>
39
+
40
+ To pass messages only with specified user-agent categories (and merge woothee parser result), configure like this:
41
+
42
+ <label @accesslog>
43
+ <filter input.**>
44
+ @type woothee
45
+ key_name agent
46
+ merge_agent_info yes
47
+ filter_categories pc,smartphone,mobilephone,appliance
48
+ </filter> # logs of other categories will be dropped
49
+
50
+ # ...
51
+ </label>
109
52
 
110
53
  Or, you can specify categories to drop (and not to merge woothee result):
111
54
 
112
- <filter input.**>
113
- type woothee
114
- key_name agent
115
- drop_categories crawler
116
- merge_agent_info false # default
117
- </filter>
55
+ <label @accesslog>
56
+ <filter input.**>
57
+ @type woothee
58
+ key_name agent
59
+ merge_agent_info false # default
60
+ drop_categories crawler
61
+ </filter>
62
+
63
+ # ...
64
+ </label>
118
65
 
119
66
  ### Fast Crawler Filter
120
67
 
121
68
  If you want to drop __almost__ all of messages with crawler's user-agent, and not to merge woothee result, you just specify plugin type:
122
69
 
123
70
  <filter input.**>
124
- type woothee_fast_crawler_filter
71
+ @type woothee_fast_crawler_filter
125
72
  key_name useragent
126
73
  </filter>
127
74
 
128
- 'fluent-plugin-woothee' uses 'Woothee.is_crawler' of woothee with this configuration, fast and incomplete method to judge user-agent is crawler or not.
75
+ 'fluent-plugin-woothee' uses 'Woothee.is\_crawler' of woothee with this configuration, fast and incomplete method to judge user-agent is crawler or not.
129
76
  If you want to drop all of crawlers completely, specify 'type woothee' and 'drop_categories crawler'.
130
77
 
78
+ ### Output plugin
79
+
80
+ The output version of woothee plugin is not supported in versions for Fluentd v0.14.
81
+
131
82
  ## TODO
132
83
 
133
84
  * patches welcome!
@@ -2,7 +2,7 @@
2
2
 
3
3
  Gem::Specification.new do |gem|
4
4
  gem.name = "fluent-plugin-woothee"
5
- gem.version = "0.2.2"
5
+ gem.version = "1.0.0"
6
6
  gem.authors = ["TAGOMORI Satoshi"]
7
7
  gem.email = ["tagomoris@gmail.com"]
8
8
  gem.description = %q{parsing by Project Woothee. See https://github.com/woothee/woothee }
@@ -17,6 +17,6 @@ Gem::Specification.new do |gem|
17
17
 
18
18
  gem.add_development_dependency "rake"
19
19
  gem.add_development_dependency "test-unit", "~> 3.0.2"
20
- gem.add_runtime_dependency "fluentd", "< 0.14.0"
20
+ gem.add_runtime_dependency "fluentd", ">= 0.14.0"
21
21
  gem.add_runtime_dependency "woothee", ">= 1.0.0"
22
22
  end
@@ -1,46 +1,47 @@
1
- class Fluent::WootheeFilter < Fluent::Filter
1
+ require 'fluent/plugin/filter'
2
+ require 'woothee'
3
+
4
+ class Fluent::Plugin::WootheeFilter < Fluent::Plugin::Filter
2
5
  Fluent::Plugin.register_filter('woothee', self)
3
6
  Fluent::Plugin.register_filter('woothee_fast_crawler_filter', self)
4
7
 
5
- config_param :fast_crawler_filter_mode, :bool, :default => false
8
+ config_param :fast_crawler_filter_mode, :bool, default: false
6
9
 
7
10
  config_param :key_name, :string
8
11
 
9
- config_param :filter_categories, :default => [] do |val|
10
- val.split(',').map(&:to_sym)
11
- end
12
- config_param :drop_categories, :default => [] do |val|
13
- val.split(',').map(&:to_sym)
14
- end
12
+ config_param :filter_categories, :array, value_type: :string, default: []
13
+ config_param :drop_categories, :array, value_type: :string, default: []
14
+
15
15
  attr_accessor :mode
16
16
 
17
- config_param :merge_agent_info, :bool, :default => false
18
- config_param :out_key_name, :string, :default => 'agent_name'
19
- config_param :out_key_category, :string, :default => 'agent_category'
20
- config_param :out_key_os, :string, :default => 'agent_os'
21
- config_param :out_key_os_version, :string, :default => nil # supress output
22
- config_param :out_key_version, :string, :default => nil # supress output
23
- config_param :out_key_vendor, :string, :default => nil # supress output
24
-
25
- def initialize
26
- super
27
- require 'woothee'
28
- end
17
+ config_param :merge_agent_info, :bool, default: false
18
+ config_param :out_key_name, :string, default: 'agent_name'
19
+ config_param :out_key_category, :string, default: 'agent_category'
20
+ config_param :out_key_os, :string, default: 'agent_os'
21
+ config_param :out_key_os_version, :string, default: nil # supress output in default
22
+ config_param :out_key_version, :string, default: nil # supress output in default
23
+ config_param :out_key_vendor, :string, default: nil # supress output in default
29
24
 
30
25
  def configure(conf)
26
+ specified_type_name = conf['@type']
27
+
31
28
  super
32
29
 
33
- if conf['type'] == 'woothee_fast_crawler_filter' or @fast_crawler_filter_mode
30
+ @filter_categories = @filter_categories.map(&:to_sym)
31
+ @drop_categories = @drop_categories.map(&:to_sym)
32
+
33
+ if specified_type_name == 'woothee_fast_crawler_filter' || @fast_crawler_filter_mode
34
34
  @fast_crawler_filter_mode = true
35
35
 
36
- if @filter_categories.size > 0 or @drop_categories.size > 0 or @merge_agent_info
36
+ if @filter_categories.size > 0 || @drop_categories.size > 0 || @merge_agent_info
37
37
  raise Fluent::ConfigError, "fast_crawler_filter cannot be specified with filter/drop/merge options"
38
38
  end
39
39
 
40
+ define_singleton_method(:filter, method(:filter_fast_crawler))
40
41
  return
41
42
  end
42
43
 
43
- if @filter_categories.size > 0 and @drop_categories.size > 0
44
+ if @filter_categories.size > 0 && @drop_categories.size > 0
44
45
  raise Fluent::ConfigError, "both of 'filter' and 'drop' categories specified"
45
46
  elsif @filter_categories.size > 0
46
47
  unless @filter_categories.reduce(true){|r,i| r and Woothee::CATEGORY_LIST.include?(i)}
@@ -56,52 +57,47 @@ class Fluent::WootheeFilter < Fluent::Filter
56
57
  @mode = :through
57
58
  end
58
59
 
59
- if @mode == :through and not @merge_agent_info
60
+ if @mode == :through && ! @merge_agent_info
60
61
  raise Fluent::ConfigError, "configured not to do nothing (not to do either filter/drop nor addition of parser result)"
61
62
  end
62
- end
63
63
 
64
- def fast_crawler_filter_stream(tag, es)
65
- new_es = Fluent::MultiEventStream.new
64
+ define_singleton_method(:filter, method(:filter_standard))
65
+ end
66
66
 
67
- es.each do |time,record|
68
- unless Woothee.is_crawler(record[@key_name] || '')
69
- new_es.add(time, record.dup)
70
- end
67
+ def filter(tag, time, record)
68
+ # dynamically overwritten by #configure
69
+ if @fast_crawler_filter_mode
70
+ filter_fast_crawler(tag, time, record)
71
+ else
72
+ filter_standard(tag, time, record)
71
73
  end
72
- new_es
73
74
  end
74
75
 
75
- def normal_filter_stream(tag, es)
76
- new_es = Fluent::MultiEventStream.new
77
-
78
- es.each do |time,record|
79
- parsed = Woothee.parse(record[@key_name] || '')
80
-
81
- category = parsed[Woothee::ATTRIBUTE_CATEGORY]
82
- next if @mode == :filter and not @filter_categories.include?(category)
83
- next if @mode == :drop and @drop_categories.include?(category)
84
-
85
- if @merge_agent_info
86
- record = record.merge({
87
- @out_key_name => parsed[Woothee::ATTRIBUTE_NAME],
88
- @out_key_category => parsed[Woothee::ATTRIBUTE_CATEGORY].to_s,
89
- @out_key_os => parsed[Woothee::ATTRIBUTE_OS]
90
- })
91
- record[@out_key_os_version] = parsed[Woothee::ATTRIBUTE_OS_VERSION] if @out_key_os_version
92
- record[@out_key_version] = parsed[Woothee::ATTRIBUTE_VERSION] if @out_key_version
93
- record[@out_key_vendor] = parsed[Woothee::ATTRIBUTE_VENDOR] if @out_key_vendor
94
- end
95
- new_es.add(time, record.dup)
76
+ def filter_fast_crawler(tag, time, record)
77
+ if Woothee.is_crawler(record[@key_name] || '')
78
+ nil
79
+ else
80
+ record
96
81
  end
97
- new_es
98
82
  end
99
83
 
100
- def filter_stream(tag, es)
101
- if @fast_crawler_filter_mode
102
- fast_crawler_filter_stream(tag, es)
103
- else
104
- normal_filter_stream(tag, es)
84
+ def filter_standard(tag, time, record)
85
+ parsed = Woothee.parse(record[@key_name] || '')
86
+
87
+ category = parsed[Woothee::ATTRIBUTE_CATEGORY]
88
+ return nil if @mode == :filter && !@filter_categories.include?(category)
89
+ return nil if @mode == :drop && @drop_categories.include?(category)
90
+
91
+ if @merge_agent_info
92
+ record = record.merge({
93
+ @out_key_name => parsed[Woothee::ATTRIBUTE_NAME],
94
+ @out_key_category => parsed[Woothee::ATTRIBUTE_CATEGORY].to_s,
95
+ @out_key_os => parsed[Woothee::ATTRIBUTE_OS]
96
+ })
97
+ record[@out_key_os_version] = parsed[Woothee::ATTRIBUTE_OS_VERSION] if @out_key_os_version
98
+ record[@out_key_version] = parsed[Woothee::ATTRIBUTE_VERSION] if @out_key_version
99
+ record[@out_key_vendor] = parsed[Woothee::ATTRIBUTE_VENDOR] if @out_key_vendor
105
100
  end
101
+ record
106
102
  end
107
- end if defined?(Fluent::Filter)
103
+ end
@@ -22,7 +22,6 @@ unless ENV.has_key?('VERBOSE')
22
22
  $log = nulllogger
23
23
  end
24
24
 
25
- require 'fluent/plugin/out_woothee'
26
25
  require 'fluent/plugin/filter_woothee'
27
26
 
28
27
  class Test::Unit::TestCase
@@ -1,4 +1,5 @@
1
1
  require 'helper'
2
+ require 'fluent/test/driver/filter'
2
3
 
3
4
  class Fluent::WootheeFilterTest < Test::Unit::TestCase
4
5
  # fast crawler filter
@@ -36,13 +37,11 @@ drop_categories crawler,misc
36
37
  ]
37
38
 
38
39
  def setup
39
- omit("Use fluentd v0.12 or later") unless defined?(Fluent::Filter)
40
-
41
40
  Fluent::Test.setup
42
41
  end
43
42
 
44
- def create_driver(conf=CONFIG1,tag='test')
45
- Fluent::Test::FilterTestDriver.new(Fluent::WootheeFilter, tag).configure(conf)
43
+ def create_driver(conf=CONFIG1)
44
+ Fluent::Test::Driver::Filter.new(Fluent::Plugin::WootheeFilter).configure(conf)
46
45
  end
47
46
 
48
47
  class TestConfigure < self
@@ -105,53 +104,51 @@ drop_categories crawler,misc
105
104
  def test_filter_fast_crawler_filter_stream
106
105
  d = create_driver CONFIG0
107
106
  time = Time.parse('2012-07-20 16:19:00').to_i
108
- d.run do
109
- d.filter({'useragent' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', 'value' => 1}, time)
110
- d.filter({'useragent' => 'Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)', 'value' => 2}, time)
111
- d.filter({'useragent' => 'Mozilla/5.0 (iPad; U; CPU OS 4_3_2 like Mac OS X; ja-jp) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5', 'value' => 3}, time)
112
- d.filter({'useragent' => 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)', 'value' => 4}, time)
113
- d.filter({'useragent' => 'Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)', 'value' => 5}, time)
114
- d.filter({'useragent' => 'Mozilla/5.0 (compatible; Rakutenbot/1.0; +http://dynamic.rakuten.co.jp/bot.html)', 'value' => 6}, time)
115
- d.filter({'useragent' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; ja-jp) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1', 'value' => 7}, time)
116
- d.filter({'useragent' => 'Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)', 'value' => 8}, time)
107
+ d.run(default_tag: 'test') do
108
+ d.feed(time, {'useragent' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', 'value' => 1})
109
+ d.feed(time, {'useragent' => 'Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)', 'value' => 2})
110
+ d.feed(time, {'useragent' => 'Mozilla/5.0 (iPad; U; CPU OS 4_3_2 like Mac OS X; ja-jp) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5', 'value' => 3})
111
+ d.feed(time, {'useragent' => 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)', 'value' => 4})
112
+ d.feed(time, {'useragent' => 'Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)', 'value' => 5})
113
+ d.feed(time, {'useragent' => 'Mozilla/5.0 (compatible; Rakutenbot/1.0; +http://dynamic.rakuten.co.jp/bot.html)', 'value' => 6})
114
+ d.feed(time, {'useragent' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; ja-jp) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1', 'value' => 7})
115
+ d.feed(time, {'useragent' => 'Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)', 'value' => 8})
117
116
  end
118
117
 
119
- filtered = d.filtered_as_array
118
+ filtered = d.filtered
120
119
  assert_equal 4, filtered.size
121
120
 
122
- assert_equal 'test', filtered[0][0]
123
- assert_equal time, filtered[0][1]
124
- assert_equal 'Mozilla/5.0 (iPad; U; CPU OS 4_3_2 like Mac OS X; ja-jp) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5', filtered[0][2]['useragent']
125
- assert_equal 3, filtered[0][2]['value']
126
- assert_equal 2, filtered[0][2].keys.size
121
+ assert_equal time, filtered[0][0]
122
+ assert_equal 'Mozilla/5.0 (iPad; U; CPU OS 4_3_2 like Mac OS X; ja-jp) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5', filtered[0][1]['useragent']
123
+ assert_equal 3, filtered[0][1]['value']
124
+ assert_equal 2, filtered[0][1].keys.size
127
125
 
128
- assert_equal 4, filtered[1][2]['value']
129
- assert_equal 6, filtered[2][2]['value']
130
- assert_equal 7, filtered[3][2]['value']
126
+ assert_equal 4, filtered[1][1]['value']
127
+ assert_equal 6, filtered[2][1]['value']
128
+ assert_equal 7, filtered[3][1]['value']
131
129
  end
132
130
 
133
131
  # through & merge
134
132
  def test_filter_through
135
- d = create_driver(CONFIG1, 'test.message')
133
+ d = create_driver(CONFIG1)
136
134
  time = Time.parse('2012-07-20 16:40:30').to_i
137
- d.run do
138
- d.filter({'value' => 0, 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'}, time)
139
- d.filter({'value' => 1, 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
140
- d.filter({'value' => 2, 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
141
- d.filter({'value' => 3, 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'}, time)
142
- d.filter({'value' => 4, 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'}, time)
143
- d.filter({'value' => 5, 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'}, time)
144
- d.filter({'value' => 6, 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'}, time)
145
- d.filter({'value' => 7, 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'}, time)
135
+ d.run(default_tag: 'test.message') do
136
+ d.feed(time, {'value' => 0, 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'})
137
+ d.feed(time, {'value' => 1, 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'})
138
+ d.feed(time, {'value' => 2, 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'})
139
+ d.feed(time, {'value' => 3, 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'})
140
+ d.feed(time, {'value' => 4, 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'})
141
+ d.feed(time, {'value' => 5, 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'})
142
+ d.feed(time, {'value' => 6, 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'})
143
+ d.feed(time, {'value' => 7, 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'})
146
144
  end
147
145
 
148
- filtered = d.filtered_as_array
146
+ filtered = d.filtered
149
147
  assert_equal 8, filtered.size
150
- assert_equal 'test.message', filtered[0][0]
151
- assert_equal time, filtered[0][1]
148
+ assert_equal time, filtered[0][0]
152
149
 
153
150
  # 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'
154
- m = filtered[0][2]
151
+ m = filtered[0][1]
155
152
  assert_equal 0, m['value']
156
153
  assert_equal 'Internet Explorer', m['agent_name']
157
154
  assert_equal 'pc', m['agent_category']
@@ -159,49 +156,49 @@ drop_categories crawler,misc
159
156
  assert_equal 5, m.keys.size
160
157
 
161
158
  # 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
162
- m = filtered[1][2]
159
+ m = filtered[1][1]
163
160
  assert_equal 1, m['value']
164
161
  assert_equal 'Firefox', m['agent_name']
165
162
  assert_equal 'pc', m['agent_category']
166
163
  assert_equal 'Windows Vista', m['agent_os']
167
164
 
168
165
  # 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
169
- m = filtered[2][2]
166
+ m = filtered[2][1]
170
167
  assert_equal 2, m['value']
171
168
  assert_equal 'Firefox', m['agent_name']
172
169
  assert_equal 'pc', m['agent_category']
173
170
  assert_equal 'Linux', m['agent_os']
174
171
 
175
172
  # 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'
176
- m = filtered[3][2]
173
+ m = filtered[3][1]
177
174
  assert_equal 3, m['value']
178
175
  assert_equal 'Safari', m['agent_name']
179
176
  assert_equal 'smartphone', m['agent_category']
180
177
  assert_equal 'Android', m['agent_os']
181
178
 
182
179
  # 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'
183
- m = filtered[4][2]
180
+ m = filtered[4][1]
184
181
  assert_equal 4, m['value']
185
182
  assert_equal 'docomo', m['agent_name']
186
183
  assert_equal 'mobilephone', m['agent_category']
187
184
  assert_equal 'docomo', m['agent_os']
188
185
 
189
186
  # 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'
190
- m = filtered[5][2]
187
+ m = filtered[5][1]
191
188
  assert_equal 5, m['value']
192
189
  assert_equal 'PlayStation Vita', m['agent_name']
193
190
  assert_equal 'appliance', m['agent_category']
194
191
  assert_equal 'PlayStation Vita', m['agent_os']
195
192
 
196
193
  # 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'
197
- m = filtered[6][2]
194
+ m = filtered[6][1]
198
195
  assert_equal 6, m['value']
199
196
  assert_equal 'Google Desktop', m['agent_name']
200
197
  assert_equal 'misc', m['agent_category']
201
198
  assert_equal 'UNKNOWN', m['agent_os']
202
199
 
203
200
  # 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'
204
- m = filtered[7][2]
201
+ m = filtered[7][1]
205
202
  assert_equal 7, m['value']
206
203
  assert_equal 'msnbot', m['agent_name']
207
204
  assert_equal 'crawler', m['agent_category']
@@ -210,26 +207,25 @@ drop_categories crawler,misc
210
207
 
211
208
  # filter & merge
212
209
  def test_filter_stream
213
- d = create_driver(CONFIG2, 'test.message')
210
+ d = create_driver(CONFIG2)
214
211
  time = Time.parse('2012-07-20 16:40:30').to_i
215
- d.run do
216
- d.filter({'value' => 0, 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'}, time)
217
- d.filter({'value' => 1, 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
218
- d.filter({'value' => 2, 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
219
- d.filter({'value' => 3, 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'}, time)
220
- d.filter({'value' => 4, 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'}, time)
221
- d.filter({'value' => 5, 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'}, time)
222
- d.filter({'value' => 6, 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'}, time)
223
- d.filter({'value' => 7, 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'}, time)
212
+ d.run(default_tag: 'test.message') do
213
+ d.feed(time, {'value' => 0, 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'})
214
+ d.feed(time, {'value' => 1, 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'})
215
+ d.feed(time, {'value' => 2, 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'})
216
+ d.feed(time, {'value' => 3, 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'})
217
+ d.feed(time, {'value' => 4, 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'})
218
+ d.feed(time, {'value' => 5, 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'})
219
+ d.feed(time, {'value' => 6, 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'})
220
+ d.feed(time, {'value' => 7, 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'})
224
221
  end
225
222
 
226
- filtered = d.filtered_as_array
223
+ filtered = d.filtered
227
224
  assert_equal 6, filtered.size
228
- assert_equal 'test.message', filtered[0][0]
229
- assert_equal time, filtered[0][1]
225
+ assert_equal time, filtered[0][0]
230
226
 
231
227
  # 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'
232
- m = filtered[0][2]
228
+ m = filtered[0][1]
233
229
  assert_equal 8, m.keys.size
234
230
  assert_equal 0, m['value']
235
231
  assert_equal 'Internet Explorer', m['ua_name']
@@ -240,7 +236,7 @@ drop_categories crawler,misc
240
236
  assert_equal '10.0', m['ua_version']
241
237
 
242
238
  # 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
243
- m = filtered[1][2]
239
+ m = filtered[1][1]
244
240
  assert_equal 1, m['value']
245
241
  assert_equal 'Firefox', m['ua_name']
246
242
  assert_equal 'pc', m['ua_category']
@@ -250,7 +246,7 @@ drop_categories crawler,misc
250
246
  assert_equal '9.0.1', m['ua_version']
251
247
 
252
248
  # 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
253
- m = filtered[2][2]
249
+ m = filtered[2][1]
254
250
  assert_equal 2, m['value']
255
251
  assert_equal 'Firefox', m['ua_name']
256
252
  assert_equal 'pc', m['ua_category']
@@ -260,7 +256,7 @@ drop_categories crawler,misc
260
256
  assert_equal '9.0.1', m['ua_version']
261
257
 
262
258
  # 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'
263
- m = filtered[3][2]
259
+ m = filtered[3][1]
264
260
  assert_equal 3, m['value']
265
261
  assert_equal 'Safari', m['ua_name']
266
262
  assert_equal 'smartphone', m['ua_category']
@@ -270,7 +266,7 @@ drop_categories crawler,misc
270
266
  assert_equal '4.0', m['ua_version']
271
267
 
272
268
  # 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'
273
- m = filtered[4][2]
269
+ m = filtered[4][1]
274
270
  assert_equal 4, m['value']
275
271
  assert_equal 'docomo', m['ua_name']
276
272
  assert_equal 'mobilephone', m['ua_category']
@@ -280,7 +276,7 @@ drop_categories crawler,misc
280
276
  assert_equal 'N505i', m['ua_version']
281
277
 
282
278
  # 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'
283
- m = filtered[5][2]
279
+ m = filtered[5][1]
284
280
  assert_equal 5, m['value']
285
281
  assert_equal 'PlayStation Vita', m['ua_name']
286
282
  assert_equal 'appliance', m['ua_category']
@@ -292,47 +288,46 @@ drop_categories crawler,misc
292
288
 
293
289
  # drop & non-merge
294
290
  def test_filter_drop
295
- d = create_driver(CONFIG3, 'test.message')
291
+ d = create_driver(CONFIG3)
296
292
  time = Time.parse('2012-07-20 16:40:30').to_i
297
- d.run do
298
- d.filter({'value' => 0, 'user_agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'}, time)
299
- d.filter({'value' => 1, 'user_agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
300
- d.filter({'value' => 2, 'user_agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
301
- d.filter({'value' => 3, 'user_agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'}, time)
302
- d.filter({'value' => 4, 'user_agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'}, time)
303
- d.filter({'value' => 5, 'user_agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'}, time)
304
- d.filter({'value' => 6, 'user_agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'}, time)
305
- d.filter({'value' => 7, 'user_agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'}, time)
293
+ d.run(default_tag: 'test.message') do
294
+ d.feed(time, {'value' => 0, 'user_agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'})
295
+ d.feed(time, {'value' => 1, 'user_agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'})
296
+ d.feed(time, {'value' => 2, 'user_agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'})
297
+ d.feed(time, {'value' => 3, 'user_agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'})
298
+ d.feed(time, {'value' => 4, 'user_agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'})
299
+ d.feed(time, {'value' => 5, 'user_agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'})
300
+ d.feed(time, {'value' => 6, 'user_agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'})
301
+ d.feed(time, {'value' => 7, 'user_agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'})
306
302
  end
307
303
 
308
- filtered = d.filtered_as_array
304
+ filtered = d.filtered
309
305
  assert_equal 6, filtered.size
310
- assert_equal 'test.message', filtered[0][0]
311
- assert_equal time, filtered[0][1]
306
+ assert_equal time, filtered[0][0]
312
307
 
313
308
  # 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'
314
- m = filtered[0][2]
309
+ m = filtered[0][1]
315
310
  assert_equal 0, m['value']
316
311
  assert_equal 2, m.keys.size
317
312
 
318
313
  # 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
319
- m = filtered[1][2]
314
+ m = filtered[1][1]
320
315
  assert_equal 1, m['value']
321
316
 
322
317
  # 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
323
- m = filtered[2][2]
318
+ m = filtered[2][1]
324
319
  assert_equal 2, m['value']
325
320
 
326
321
  # 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'
327
- m = filtered[3][2]
322
+ m = filtered[3][1]
328
323
  assert_equal 3, m['value']
329
324
 
330
325
  # 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'
331
- m = filtered[4][2]
326
+ m = filtered[4][1]
332
327
  assert_equal 4, m['value']
333
328
 
334
329
  # 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'
335
- m = filtered[5][2]
330
+ m = filtered[5][1]
336
331
  assert_equal 5, m['value']
337
332
  end
338
333
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fluent-plugin-woothee
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.2
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - TAGOMORI Satoshi
@@ -42,14 +42,14 @@ dependencies:
42
42
  name: fluentd
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
- - - "<"
45
+ - - ">="
46
46
  - !ruby/object:Gem::Version
47
47
  version: 0.14.0
48
48
  type: :runtime
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - "<"
52
+ - - ">="
53
53
  - !ruby/object:Gem::Version
54
54
  version: 0.14.0
55
55
  - !ruby/object:Gem::Dependency
@@ -81,10 +81,8 @@ files:
81
81
  - Rakefile
82
82
  - fluent-plugin-woothee.gemspec
83
83
  - lib/fluent/plugin/filter_woothee.rb
84
- - lib/fluent/plugin/out_woothee.rb
85
84
  - test/helper.rb
86
85
  - test/plugin/test_filter_woothee.rb
87
- - test/plugin/test_out_woothee.rb
88
86
  homepage: https://github.com/woothee/fluent-plugin-woothee
89
87
  licenses:
90
88
  - Apache-2.0
@@ -113,4 +111,3 @@ summary: Fluentd plugin to parse UserAgent strings with woothee parser. It adds
113
111
  test_files:
114
112
  - test/helper.rb
115
113
  - test/plugin/test_filter_woothee.rb
116
- - test/plugin/test_out_woothee.rb
@@ -1,154 +0,0 @@
1
- class Fluent::WootheeOutput < Fluent::Output
2
- Fluent::Plugin.register_output('woothee', self)
3
- Fluent::Plugin.register_output('woothee_fast_crawler_filter', self)
4
-
5
- # Define `router` method of v0.12 to support v0.10 or earlier
6
- unless method_defined?(:router)
7
- define_method("router") { Fluent::Engine }
8
- end
9
-
10
- # Define `log` method for v0.10.42 or earlier
11
- unless method_defined?(:log)
12
- define_method("log") { $log }
13
- end
14
-
15
- config_param :tag, :string, :default => nil
16
- config_param :remove_prefix, :string, :default => nil
17
- config_param :add_prefix, :string, :default => nil
18
-
19
- config_param :fast_crawler_filter_mode, :bool, :default => false
20
-
21
- config_param :key_name, :string
22
-
23
- config_param :filter_categories, :default => [] do |val|
24
- val.split(',').map(&:to_sym)
25
- end
26
- config_param :drop_categories, :default => [] do |val|
27
- val.split(',').map(&:to_sym)
28
- end
29
- attr_accessor :mode
30
-
31
- config_param :merge_agent_info, :bool, :default => false
32
- config_param :out_key_name, :string, :default => 'agent_name'
33
- config_param :out_key_category, :string, :default => 'agent_category'
34
- config_param :out_key_os, :string, :default => 'agent_os'
35
- config_param :out_key_os_version, :string, :default => nil # supress output
36
- config_param :out_key_version, :string, :default => nil # supress output
37
- config_param :out_key_vendor, :string, :default => nil # supress output
38
-
39
- def initialize
40
- super
41
- require 'woothee'
42
- end
43
-
44
- def configure(conf)
45
- super
46
-
47
- # tag ->
48
- if not @tag and not @remove_prefix and not @add_prefix
49
- raise Fluent::ConfigError, "missing both of remove_prefix and add_prefix"
50
- end
51
- if @tag and (@remove_prefix or @add_prefix)
52
- raise Fluent::ConfigError, "both of tag and remove_prefix/add_prefix must not be specified"
53
- end
54
- if @remove_prefix
55
- @removed_prefix_string = @remove_prefix + '.'
56
- @removed_length = @removed_prefix_string.length
57
- end
58
- if @add_prefix
59
- @added_prefix_string = @add_prefix + '.'
60
- end
61
- # <- tag
62
-
63
- if conf['type'] == 'woothee_fast_crawler_filter' or @fast_crawler_filter_mode
64
- @fast_crawler_filter_mode = true
65
-
66
- if @filter_categories.size > 0 or @drop_categories.size > 0 or @merge_agent_info
67
- raise Fluent::ConfigError, "fast_crawler_filter cannot be specified with filter/drop/merge options"
68
- end
69
-
70
- return
71
- end
72
-
73
- if @filter_categories.size > 0 and @drop_categories.size > 0
74
- raise Fluent::ConfigError, "both of 'filter' and 'drop' categories specified"
75
- elsif @filter_categories.size > 0
76
- unless @filter_categories.reduce(true){|r,i| r and Woothee::CATEGORY_LIST.include?(i)}
77
- raise Fluent::ConfigError, "filter_categories has invalid category name"
78
- end
79
- @mode = :filter
80
- elsif @drop_categories.size > 0
81
- unless @drop_categories.reduce(true){|r,i| r and Woothee::CATEGORY_LIST.include?(i)}
82
- raise Fluent::ConfigError, "drop_categories has invalid category name"
83
- end
84
- @mode = :drop
85
- else
86
- @mode = :through
87
- end
88
-
89
- if @mode == :through and not @merge_agent_info
90
- raise Fluent::ConfigError, "configured not to do nothing (not to do either filter/drop nor addition of parser result)"
91
- end
92
- end
93
-
94
- def tag_mangle(tag)
95
- if @tag
96
- @tag
97
- else
98
- if @remove_prefix and
99
- ( (tag.start_with?(@removed_prefix_string) and tag.length > @removed_length) or tag == @remove_prefix)
100
- tag = tag[@removed_length..-1]
101
- end
102
- if @add_prefix
103
- tag = if tag and tag.length > 0
104
- @added_prefix_string + tag
105
- else
106
- @add_prefix
107
- end
108
- end
109
- tag
110
- end
111
- end
112
-
113
- def fast_crawler_filter_emit(tag, es)
114
- es.each do |time,record|
115
- unless Woothee.is_crawler(record[@key_name] || '')
116
- router.emit(tag, time, record)
117
- end
118
- end
119
- end
120
-
121
- def normal_emit(tag, es)
122
- es.each do |time,record|
123
- parsed = Woothee.parse(record[@key_name] || '')
124
-
125
- category = parsed[Woothee::ATTRIBUTE_CATEGORY]
126
- next if @mode == :filter and not @filter_categories.include?(category)
127
- next if @mode == :drop and @drop_categories.include?(category)
128
-
129
- if @merge_agent_info
130
- record = record.merge({
131
- @out_key_name => parsed[Woothee::ATTRIBUTE_NAME],
132
- @out_key_category => parsed[Woothee::ATTRIBUTE_CATEGORY].to_s,
133
- @out_key_os => parsed[Woothee::ATTRIBUTE_OS]
134
- })
135
- record[@out_key_os_version] = parsed[Woothee::ATTRIBUTE_OS_VERSION] if @out_key_os_version
136
- record[@out_key_version] = parsed[Woothee::ATTRIBUTE_VERSION] if @out_key_version
137
- record[@out_key_vendor] = parsed[Woothee::ATTRIBUTE_VENDOR] if @out_key_vendor
138
- end
139
- router.emit(tag, time, record)
140
- end
141
- end
142
-
143
- def emit(tag, es, chain)
144
- tag = tag_mangle(tag)
145
-
146
- if @fast_crawler_filter_mode
147
- fast_crawler_filter_emit(tag, es)
148
- else
149
- normal_emit(tag, es)
150
- end
151
-
152
- chain.next
153
- end
154
- end
@@ -1,364 +0,0 @@
1
- require 'helper'
2
-
3
- class Fluent::WootheeOutputTest < Test::Unit::TestCase
4
- # fast crawler filter
5
- CONFIG0 = %[
6
- type woothee_fast_crawler_filter
7
- key_name useragent
8
- tag filtered
9
- ]
10
-
11
- # through & merge
12
- CONFIG1 = %[
13
- type woothee
14
- key_name agent
15
- remove_prefix test
16
- add_prefix merged
17
- merge_agent_info yes
18
- ]
19
-
20
- # filter & merge
21
- CONFIG2 = %[
22
- type woothee
23
- key_name agent
24
- filter_categories pc,smartphone,mobilephone,appliance
25
- remove_prefix test
26
- add_prefix merged
27
- merge_agent_info yes
28
- out_key_name ua_name
29
- out_key_category ua_category
30
- out_key_os ua_os
31
- out_key_os_version ua_os_version
32
- out_key_version ua_version
33
- out_key_vendor ua_vendor
34
- ]
35
-
36
- # drop & non-merge
37
- CONFIG3 = %[
38
- type woothee
39
- key_name user_agent
40
- drop_categories crawler,misc
41
- tag selected
42
- ]
43
-
44
- def setup
45
- Fluent::Test.setup
46
- end
47
-
48
- def create_driver(conf=CONFIG1,tag='test')
49
- Fluent::Test::OutputTestDriver.new(Fluent::WootheeOutput, tag).configure(conf)
50
- end
51
-
52
- def test_configure
53
- # fast_crawler_filter
54
- d = create_driver CONFIG0
55
- assert_equal true, d.instance.fast_crawler_filter_mode
56
- assert_equal 'useragent', d.instance.key_name
57
- assert_equal 'filtered', d.instance.tag
58
-
59
- # through & merge
60
- d = create_driver CONFIG1
61
- assert_equal false, d.instance.fast_crawler_filter_mode
62
- assert_equal 'agent', d.instance.key_name
63
- assert_equal 'test', d.instance.remove_prefix
64
- assert_equal 'merged', d.instance.add_prefix
65
-
66
- assert_equal 0, d.instance.filter_categories.size
67
- assert_equal 0, d.instance.drop_categories.size
68
- assert_equal :through, d.instance.mode
69
-
70
- assert_equal true, d.instance.merge_agent_info
71
- assert_equal 'agent_name', d.instance.out_key_name
72
- assert_equal 'agent_category', d.instance.out_key_category
73
- assert_equal 'agent_os', d.instance.out_key_os
74
- assert_nil d.instance.out_key_version
75
- assert_nil d.instance.out_key_vendor
76
-
77
- # filter & merge
78
- d = create_driver CONFIG2
79
- assert_equal false, d.instance.fast_crawler_filter_mode
80
- assert_equal 'agent', d.instance.key_name
81
- assert_equal 'test', d.instance.remove_prefix
82
- assert_equal 'merged', d.instance.add_prefix
83
-
84
- assert_equal 4, d.instance.filter_categories.size
85
- assert_equal [:pc,:smartphone,:mobilephone,:appliance], d.instance.filter_categories
86
- assert_equal 0, d.instance.drop_categories.size
87
- assert_equal :filter, d.instance.mode
88
-
89
- assert_equal true, d.instance.merge_agent_info
90
- assert_equal 'ua_name', d.instance.out_key_name
91
- assert_equal 'ua_category', d.instance.out_key_category
92
- assert_equal 'ua_os', d.instance.out_key_os
93
- assert_equal 'ua_os_version', d.instance.out_key_os_version
94
- assert_equal 'ua_version', d.instance.out_key_version
95
- assert_equal 'ua_vendor', d.instance.out_key_vendor
96
-
97
- # drop & non-merge
98
- d = create_driver CONFIG3
99
- assert_equal false, d.instance.fast_crawler_filter_mode
100
- assert_equal 'user_agent', d.instance.key_name
101
- assert_equal 'selected', d.instance.tag
102
-
103
- assert_equal 0, d.instance.filter_categories.size
104
- assert_equal 2, d.instance.drop_categories.size
105
- assert_equal [:crawler,:misc], d.instance.drop_categories
106
- assert_equal :drop, d.instance.mode
107
-
108
- assert_equal false, d.instance.merge_agent_info
109
- end
110
-
111
- def test_tag_mangle
112
- p = create_driver(CONFIG0).instance
113
- assert_equal 'filtered', p.tag_mangle('data')
114
- assert_equal 'filtered', p.tag_mangle('test.data')
115
- assert_equal 'filtered', p.tag_mangle('test.test.data')
116
- assert_equal 'filtered', p.tag_mangle('test')
117
-
118
- p = create_driver(CONFIG1).instance
119
- assert_equal 'merged.data', p.tag_mangle('data')
120
- assert_equal 'merged.data', p.tag_mangle('test.data')
121
- assert_equal 'merged.test.data', p.tag_mangle('test.test.data')
122
- assert_equal 'merged', p.tag_mangle('test')
123
-
124
- p = create_driver(CONFIG3).instance
125
- assert_equal 'selected', p.tag_mangle('data')
126
- assert_equal 'selected', p.tag_mangle('test.data')
127
- assert_equal 'selected', p.tag_mangle('test.test.data')
128
- assert_equal 'selected', p.tag_mangle('test')
129
- end
130
-
131
- def test_emit_fast_crawler_filter
132
- d = create_driver CONFIG0
133
- time = Time.parse('2012-07-20 16:19:00').to_i
134
- d.run do
135
- d.emit({'useragent' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', 'value' => 1}, time)
136
- d.emit({'useragent' => 'Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)', 'value' => 2}, time)
137
- d.emit({'useragent' => 'Mozilla/5.0 (iPad; U; CPU OS 4_3_2 like Mac OS X; ja-jp) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5', 'value' => 3}, time)
138
- d.emit({'useragent' => 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)', 'value' => 4}, time)
139
- d.emit({'useragent' => 'Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)', 'value' => 5}, time)
140
- d.emit({'useragent' => 'Mozilla/5.0 (compatible; Rakutenbot/1.0; +http://dynamic.rakuten.co.jp/bot.html)', 'value' => 6}, time)
141
- d.emit({'useragent' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; ja-jp) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1', 'value' => 7}, time)
142
- d.emit({'useragent' => 'Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)', 'value' => 8}, time)
143
- end
144
-
145
- emits = d.emits
146
- assert_equal 4, emits.size
147
-
148
- assert_equal 'filtered', emits[0][0]
149
- assert_equal time, emits[0][1]
150
- assert_equal 'Mozilla/5.0 (iPad; U; CPU OS 4_3_2 like Mac OS X; ja-jp) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5', emits[0][2]['useragent']
151
- assert_equal 3, emits[0][2]['value']
152
- assert_equal 2, emits[0][2].keys.size
153
-
154
- assert_equal 4, emits[1][2]['value']
155
- assert_equal 6, emits[2][2]['value']
156
- assert_equal 7, emits[3][2]['value']
157
- end
158
-
159
- # # through & merge
160
- def test_emit_through
161
- d = create_driver(CONFIG1, 'test.message')
162
- time = Time.parse('2012-07-20 16:40:30').to_i
163
- d.run do
164
- d.emit({'value' => 0, 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'}, time)
165
- d.emit({'value' => 1, 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
166
- d.emit({'value' => 2, 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
167
- d.emit({'value' => 3, 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'}, time)
168
- d.emit({'value' => 4, 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'}, time)
169
- d.emit({'value' => 5, 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'}, time)
170
- d.emit({'value' => 6, 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'}, time)
171
- d.emit({'value' => 7, 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'}, time)
172
- end
173
-
174
- emits = d.emits
175
- assert_equal 8, emits.size
176
- assert_equal 'merged.message', emits[0][0]
177
- assert_equal time, emits[0][1]
178
-
179
- # 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'
180
- m = emits[0][2]
181
- assert_equal 0, m['value']
182
- assert_equal 'Internet Explorer', m['agent_name']
183
- assert_equal 'pc', m['agent_category']
184
- assert_equal 'Windows 8', m['agent_os']
185
- assert_equal 5, m.keys.size
186
-
187
- # 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
188
- m = emits[1][2]
189
- assert_equal 1, m['value']
190
- assert_equal 'Firefox', m['agent_name']
191
- assert_equal 'pc', m['agent_category']
192
- assert_equal 'Windows Vista', m['agent_os']
193
-
194
- # 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
195
- m = emits[2][2]
196
- assert_equal 2, m['value']
197
- assert_equal 'Firefox', m['agent_name']
198
- assert_equal 'pc', m['agent_category']
199
- assert_equal 'Linux', m['agent_os']
200
-
201
- # 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'
202
- m = emits[3][2]
203
- assert_equal 3, m['value']
204
- assert_equal 'Safari', m['agent_name']
205
- assert_equal 'smartphone', m['agent_category']
206
- assert_equal 'Android', m['agent_os']
207
-
208
- # 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'
209
- m = emits[4][2]
210
- assert_equal 4, m['value']
211
- assert_equal 'docomo', m['agent_name']
212
- assert_equal 'mobilephone', m['agent_category']
213
- assert_equal 'docomo', m['agent_os']
214
-
215
- # 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'
216
- m = emits[5][2]
217
- assert_equal 5, m['value']
218
- assert_equal 'PlayStation Vita', m['agent_name']
219
- assert_equal 'appliance', m['agent_category']
220
- assert_equal 'PlayStation Vita', m['agent_os']
221
-
222
- # 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'
223
- m = emits[6][2]
224
- assert_equal 6, m['value']
225
- assert_equal 'Google Desktop', m['agent_name']
226
- assert_equal 'misc', m['agent_category']
227
- assert_equal 'UNKNOWN', m['agent_os']
228
-
229
- # 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'
230
- m = emits[7][2]
231
- assert_equal 7, m['value']
232
- assert_equal 'msnbot', m['agent_name']
233
- assert_equal 'crawler', m['agent_category']
234
- assert_equal 'UNKNOWN', m['agent_os']
235
- end
236
-
237
- # # filter & merge
238
- def test_emit_filter
239
- d = create_driver(CONFIG2, 'test.message')
240
- time = Time.parse('2012-07-20 16:40:30').to_i
241
- d.run do
242
- d.emit({'value' => 0, 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'}, time)
243
- d.emit({'value' => 1, 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
244
- d.emit({'value' => 2, 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
245
- d.emit({'value' => 3, 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'}, time)
246
- d.emit({'value' => 4, 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'}, time)
247
- d.emit({'value' => 5, 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'}, time)
248
- d.emit({'value' => 6, 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'}, time)
249
- d.emit({'value' => 7, 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'}, time)
250
- end
251
-
252
- emits = d.emits
253
- assert_equal 6, emits.size
254
- assert_equal 'merged.message', emits[0][0]
255
- assert_equal time, emits[0][1]
256
-
257
- # 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'
258
- m = emits[0][2]
259
- assert_equal 8, m.keys.size
260
- assert_equal 0, m['value']
261
- assert_equal 'Internet Explorer', m['ua_name']
262
- assert_equal 'pc', m['ua_category']
263
- assert_equal 'Windows 8', m['ua_os']
264
- assert_equal 'NT 6.2', m['ua_os_version']
265
- assert_equal 'Microsoft', m['ua_vendor']
266
- assert_equal '10.0', m['ua_version']
267
-
268
- # 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
269
- m = emits[1][2]
270
- assert_equal 1, m['value']
271
- assert_equal 'Firefox', m['ua_name']
272
- assert_equal 'pc', m['ua_category']
273
- assert_equal 'Windows Vista', m['ua_os']
274
- assert_equal 'NT 6.0', m['ua_os_version']
275
- assert_equal 'Mozilla', m['ua_vendor']
276
- assert_equal '9.0.1', m['ua_version']
277
-
278
- # 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
279
- m = emits[2][2]
280
- assert_equal 2, m['value']
281
- assert_equal 'Firefox', m['ua_name']
282
- assert_equal 'pc', m['ua_category']
283
- assert_equal 'Linux', m['ua_os']
284
- assert_equal 'UNKNOWN', m['ua_os_version']
285
- assert_equal 'Mozilla', m['ua_vendor']
286
- assert_equal '9.0.1', m['ua_version']
287
-
288
- # 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'
289
- m = emits[3][2]
290
- assert_equal 3, m['value']
291
- assert_equal 'Safari', m['ua_name']
292
- assert_equal 'smartphone', m['ua_category']
293
- assert_equal 'Android', m['ua_os']
294
- assert_equal '3.1', m['ua_os_version']
295
- assert_equal 'Apple', m['ua_vendor']
296
- assert_equal '4.0', m['ua_version']
297
-
298
- # 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'
299
- m = emits[4][2]
300
- assert_equal 4, m['value']
301
- assert_equal 'docomo', m['ua_name']
302
- assert_equal 'mobilephone', m['ua_category']
303
- assert_equal 'docomo', m['ua_os']
304
- assert_equal 'UNKNOWN', m['ua_os_version']
305
- assert_equal 'docomo', m['ua_vendor']
306
- assert_equal 'N505i', m['ua_version']
307
-
308
- # 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'
309
- m = emits[5][2]
310
- assert_equal 5, m['value']
311
- assert_equal 'PlayStation Vita', m['ua_name']
312
- assert_equal 'appliance', m['ua_category']
313
- assert_equal 'PlayStation Vita', m['ua_os']
314
- assert_equal '1.51', m['ua_os_version']
315
- assert_equal 'Sony', m['ua_vendor']
316
- assert_equal 'UNKNOWN', m['ua_version']
317
- end
318
-
319
- # # drop & non-merge
320
- def test_emit_drop
321
- d = create_driver(CONFIG3, 'test.message')
322
- time = Time.parse('2012-07-20 16:40:30').to_i
323
- d.run do
324
- d.emit({'value' => 0, 'user_agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'}, time)
325
- d.emit({'value' => 1, 'user_agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
326
- d.emit({'value' => 2, 'user_agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
327
- d.emit({'value' => 3, 'user_agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'}, time)
328
- d.emit({'value' => 4, 'user_agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'}, time)
329
- d.emit({'value' => 5, 'user_agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'}, time)
330
- d.emit({'value' => 6, 'user_agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'}, time)
331
- d.emit({'value' => 7, 'user_agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'}, time)
332
- end
333
-
334
- emits = d.emits
335
- assert_equal 6, emits.size
336
- assert_equal 'selected', emits[0][0]
337
- assert_equal time, emits[0][1]
338
-
339
- # 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'
340
- m = emits[0][2]
341
- assert_equal 0, m['value']
342
- assert_equal 2, m.keys.size
343
-
344
- # 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
345
- m = emits[1][2]
346
- assert_equal 1, m['value']
347
-
348
- # 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
349
- m = emits[2][2]
350
- assert_equal 2, m['value']
351
-
352
- # 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'
353
- m = emits[3][2]
354
- assert_equal 3, m['value']
355
-
356
- # 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'
357
- m = emits[4][2]
358
- assert_equal 4, m['value']
359
-
360
- # 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'
361
- m = emits[5][2]
362
- assert_equal 5, m['value']
363
- end
364
- end