fluent-plugin-woothee 0.2.2 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 6adc6d94e1e14d85619c6f5f320ff6c9e1b7ec77
4
- data.tar.gz: 76a0c6592d1e33e6625bd8cb58839d64d1f7344d
3
+ metadata.gz: cf79bb6e1d07bdddb4cae76b7ae7674fbe7b460f
4
+ data.tar.gz: 6d1c7933f26b6f2b7bb7df6aacb11f645ca6f649
5
5
  SHA512:
6
- metadata.gz: 0446a8bc15e2e7cca56d407c0085574fd821ae24dfec837b0526b509e50f5c6baa18cf5b80c18dc595b4abc5775e76f8fbb6ec6ecf893c92c1fe0a7df25c4603
7
- data.tar.gz: 06e8582442bc4409103f92e3b40eadd4b41e0d9cf51731ebdb96e95d09ca4cd5204f6807ed8148d3daeb2a0b77403cd167d73ef8ec565485d8575018905fe49f
6
+ metadata.gz: 2b93cc550a0584bd74421878da8efd1da5d97c7e8f7fe09f5e98bf5cabd0b4d30c283220a5503fb5298304358a5a6986b592039394f4e06bdfef0b446f0d7034
7
+ data.tar.gz: e0444b74e92eb0bdf2cb6968b82d9998bed32b29ec025aa8a6c6272a62185e050f2b44610e9cb00a285144dc2185ad15867157c6644d7e02cb4b14c5397dbd41
@@ -1,7 +1,6 @@
1
1
  language: ruby
2
2
  sudo: false
3
3
  rvm:
4
- - 2.0.0
5
4
  - 2.1
6
5
  - 2.2
7
6
  - 2.3.0
data/README.md CHANGED
@@ -1,133 +1,84 @@
1
1
  # fluent-plugin-woothee
2
2
 
3
- ## WootheeOutput
4
-
5
- 'fluent-plugin-woothee' is a Fluentd plugin to parse UserAgent strings and to filter/drop specified categories of user terminals (like 'pc', 'smartphone' and so on).
3
+ 'fluent-plugin-woothee' is a Fluentd filter plugin to parse UserAgent strings and to filter/drop specified categories of user terminals (like 'pc', 'smartphone' and so on).
6
4
 
7
5
  'woothee' is multi-language user-agent strings parser project. See: https://github.com/woothee/woothee
8
6
 
9
7
  ## Configuration
10
8
 
11
- To add woothee parser result into matched messages:
12
-
13
- <match input.**>
14
- type woothee
15
- key_name agent
16
- remove_prefix input
17
- add_prefix merged
18
- merge_agent_info yes
19
- </match>
20
-
21
- Output messages with tag 'merged.**' has attributes like 'agent\_name', 'agent\_category' and 'agent\_os' from woothee parser result. If you want to change attribute names, or want to merge more attributes of browser vendor and its version, write configurations as below:
22
-
23
- <match input.**>
24
- type woothee
25
- key_name agent
26
- remove_prefix input
27
- add_prefix merged
28
- merge_agent_info yes
29
- out_key_name ua_name
30
- out_key_category ua_category
31
- out_key_os ua_os
32
- out_key_os_version ua_os_version
33
- out_key_version ua_version
34
- out_key_vendor ua_vendor
35
- </match>
36
-
37
- To re-emit messages with specified user-agent categories (and merge woothee parser result), configure like this:
38
-
39
- <match input.**>
40
- type woothee
41
- key_name agent
42
- filter_categories pc,smartphone,mobilephone,appliance
43
- remove_prefix input
44
- add_prefix merged
45
- merge_agent_info yes
46
- </match>
47
-
48
- Or, you can specify categories to drop (and not to merge woothee result):
49
-
50
- <match input.**>
51
- type woothee
52
- key_name agent
53
- drop_categories crawler
54
- remove_prefix input
55
- add_prefix merged
56
- merge_agent_info false # default
57
- </match>
58
-
59
- ### Fast Crawler Filter
60
-
61
- If you want to drop __almost__ all of messages with crawler's user-agent, and not to merge woothee result, you just specify plugin type:
62
-
63
- <match input.**>
64
- type woothee_fast_crawler_filter
65
- key_name useragent
66
- tag filtered
67
- </match>
68
-
69
- 'fluent-plugin-woothee' uses 'Woothee.is_crawler' of woothee with this configuration, fast and incomplete method to judge user-agent is crawler or not.
70
- If you want to drop all of crawlers completely, specify 'type woothee' and 'drop_categories crawler'.
71
-
72
- ## WootheeFilter
73
-
74
- This is filter version of 'fluent-plugin-woothee'.
75
- Note that this filter version does not have rewrite tag functionality.
76
-
77
- ## Configuration
78
-
79
- To add woothee parser result into filtered messages:
80
-
81
- <filter input.**>
82
- type woothee
83
- key_name agent
84
- merge_agent_info yes
85
- </filter>
86
-
87
- Filtered messages with non-modified tag has attributes like 'agent\_name', 'agent\_category' and 'agent\_os' from woothee parser result. If you want to change attribute names, or want to merge more attributes of browser vendor and its version, write configurations as below:
88
-
89
- <filter input.**>
90
- type woothee
91
- key_name agent
92
- merge_agent_info yes
93
- out_key_name ua_name
94
- out_key_category ua_category
95
- out_key_os ua_os
96
- out_key_os_version ua_os_version
97
- out_key_version ua_version
98
- out_key_vendor ua_vendor
99
- </filter>
100
-
101
- To filter messages with specified user-agent categories (and merge woothee parser result), configure like this:
102
-
103
- <filter input.**>
104
- type woothee
105
- key_name agent
106
- filter_categories pc,smartphone,mobilephone,appliance
107
- merge_agent_info yes
108
- </filter>
9
+ To add woothee parser result into messages:
10
+
11
+ <label @accesslog>
12
+ <filter input.**>
13
+ @type woothee
14
+ key_name agent
15
+ merge_agent_info yes
16
+ </filter>
17
+ <match ...>
18
+ </match>
19
+ </label>
20
+
21
+ Result messages has attributes like 'agent\_name', 'agent\_category' and 'agent\_os' from woothee parser result. If you want to change attribute names, or want to merge more attributes of browser vendor and its version, write configurations as below:
22
+
23
+ <label @accesslog>
24
+ <filter input.**>
25
+ @type woothee
26
+ key_name agent
27
+ merge_agent_info yes
28
+
29
+ out_key_name ua_name
30
+ out_key_category ua_category
31
+ out_key_os ua_os
32
+ out_key_os_version ua_os_version
33
+ out_key_version ua_version
34
+ out_key_vendor ua_vendor
35
+ </filter>
36
+ <match ...>
37
+ </match>
38
+ </label>
39
+
40
+ To pass messages only with specified user-agent categories (and merge woothee parser result), configure like this:
41
+
42
+ <label @accesslog>
43
+ <filter input.**>
44
+ @type woothee
45
+ key_name agent
46
+ merge_agent_info yes
47
+ filter_categories pc,smartphone,mobilephone,appliance
48
+ </filter> # logs of other categories will be dropped
49
+
50
+ # ...
51
+ </label>
109
52
 
110
53
  Or, you can specify categories to drop (and not to merge woothee result):
111
54
 
112
- <filter input.**>
113
- type woothee
114
- key_name agent
115
- drop_categories crawler
116
- merge_agent_info false # default
117
- </filter>
55
+ <label @accesslog>
56
+ <filter input.**>
57
+ @type woothee
58
+ key_name agent
59
+ merge_agent_info false # default
60
+ drop_categories crawler
61
+ </filter>
62
+
63
+ # ...
64
+ </label>
118
65
 
119
66
  ### Fast Crawler Filter
120
67
 
121
68
  If you want to drop __almost__ all of messages with crawler's user-agent, and not to merge woothee result, you just specify plugin type:
122
69
 
123
70
  <filter input.**>
124
- type woothee_fast_crawler_filter
71
+ @type woothee_fast_crawler_filter
125
72
  key_name useragent
126
73
  </filter>
127
74
 
128
- 'fluent-plugin-woothee' uses 'Woothee.is_crawler' of woothee with this configuration, fast and incomplete method to judge user-agent is crawler or not.
75
+ 'fluent-plugin-woothee' uses 'Woothee.is\_crawler' of woothee with this configuration, fast and incomplete method to judge user-agent is crawler or not.
129
76
  If you want to drop all of crawlers completely, specify 'type woothee' and 'drop_categories crawler'.
130
77
 
78
+ ### Output plugin
79
+
80
+ The output version of woothee plugin is not supported in versions for Fluentd v0.14.
81
+
131
82
  ## TODO
132
83
 
133
84
  * patches welcome!
@@ -2,7 +2,7 @@
2
2
 
3
3
  Gem::Specification.new do |gem|
4
4
  gem.name = "fluent-plugin-woothee"
5
- gem.version = "0.2.2"
5
+ gem.version = "1.0.0"
6
6
  gem.authors = ["TAGOMORI Satoshi"]
7
7
  gem.email = ["tagomoris@gmail.com"]
8
8
  gem.description = %q{parsing by Project Woothee. See https://github.com/woothee/woothee }
@@ -17,6 +17,6 @@ Gem::Specification.new do |gem|
17
17
 
18
18
  gem.add_development_dependency "rake"
19
19
  gem.add_development_dependency "test-unit", "~> 3.0.2"
20
- gem.add_runtime_dependency "fluentd", "< 0.14.0"
20
+ gem.add_runtime_dependency "fluentd", ">= 0.14.0"
21
21
  gem.add_runtime_dependency "woothee", ">= 1.0.0"
22
22
  end
@@ -1,46 +1,47 @@
1
- class Fluent::WootheeFilter < Fluent::Filter
1
+ require 'fluent/plugin/filter'
2
+ require 'woothee'
3
+
4
+ class Fluent::Plugin::WootheeFilter < Fluent::Plugin::Filter
2
5
  Fluent::Plugin.register_filter('woothee', self)
3
6
  Fluent::Plugin.register_filter('woothee_fast_crawler_filter', self)
4
7
 
5
- config_param :fast_crawler_filter_mode, :bool, :default => false
8
+ config_param :fast_crawler_filter_mode, :bool, default: false
6
9
 
7
10
  config_param :key_name, :string
8
11
 
9
- config_param :filter_categories, :default => [] do |val|
10
- val.split(',').map(&:to_sym)
11
- end
12
- config_param :drop_categories, :default => [] do |val|
13
- val.split(',').map(&:to_sym)
14
- end
12
+ config_param :filter_categories, :array, value_type: :string, default: []
13
+ config_param :drop_categories, :array, value_type: :string, default: []
14
+
15
15
  attr_accessor :mode
16
16
 
17
- config_param :merge_agent_info, :bool, :default => false
18
- config_param :out_key_name, :string, :default => 'agent_name'
19
- config_param :out_key_category, :string, :default => 'agent_category'
20
- config_param :out_key_os, :string, :default => 'agent_os'
21
- config_param :out_key_os_version, :string, :default => nil # supress output
22
- config_param :out_key_version, :string, :default => nil # supress output
23
- config_param :out_key_vendor, :string, :default => nil # supress output
24
-
25
- def initialize
26
- super
27
- require 'woothee'
28
- end
17
+ config_param :merge_agent_info, :bool, default: false
18
+ config_param :out_key_name, :string, default: 'agent_name'
19
+ config_param :out_key_category, :string, default: 'agent_category'
20
+ config_param :out_key_os, :string, default: 'agent_os'
21
+ config_param :out_key_os_version, :string, default: nil # supress output in default
22
+ config_param :out_key_version, :string, default: nil # supress output in default
23
+ config_param :out_key_vendor, :string, default: nil # supress output in default
29
24
 
30
25
  def configure(conf)
26
+ specified_type_name = conf['@type']
27
+
31
28
  super
32
29
 
33
- if conf['type'] == 'woothee_fast_crawler_filter' or @fast_crawler_filter_mode
30
+ @filter_categories = @filter_categories.map(&:to_sym)
31
+ @drop_categories = @drop_categories.map(&:to_sym)
32
+
33
+ if specified_type_name == 'woothee_fast_crawler_filter' || @fast_crawler_filter_mode
34
34
  @fast_crawler_filter_mode = true
35
35
 
36
- if @filter_categories.size > 0 or @drop_categories.size > 0 or @merge_agent_info
36
+ if @filter_categories.size > 0 || @drop_categories.size > 0 || @merge_agent_info
37
37
  raise Fluent::ConfigError, "fast_crawler_filter cannot be specified with filter/drop/merge options"
38
38
  end
39
39
 
40
+ define_singleton_method(:filter, method(:filter_fast_crawler))
40
41
  return
41
42
  end
42
43
 
43
- if @filter_categories.size > 0 and @drop_categories.size > 0
44
+ if @filter_categories.size > 0 && @drop_categories.size > 0
44
45
  raise Fluent::ConfigError, "both of 'filter' and 'drop' categories specified"
45
46
  elsif @filter_categories.size > 0
46
47
  unless @filter_categories.reduce(true){|r,i| r and Woothee::CATEGORY_LIST.include?(i)}
@@ -56,52 +57,47 @@ class Fluent::WootheeFilter < Fluent::Filter
56
57
  @mode = :through
57
58
  end
58
59
 
59
- if @mode == :through and not @merge_agent_info
60
+ if @mode == :through && ! @merge_agent_info
60
61
  raise Fluent::ConfigError, "configured not to do nothing (not to do either filter/drop nor addition of parser result)"
61
62
  end
62
- end
63
63
 
64
- def fast_crawler_filter_stream(tag, es)
65
- new_es = Fluent::MultiEventStream.new
64
+ define_singleton_method(:filter, method(:filter_standard))
65
+ end
66
66
 
67
- es.each do |time,record|
68
- unless Woothee.is_crawler(record[@key_name] || '')
69
- new_es.add(time, record.dup)
70
- end
67
+ def filter(tag, time, record)
68
+ # dynamically overwritten by #configure
69
+ if @fast_crawler_filter_mode
70
+ filter_fast_crawler(tag, time, record)
71
+ else
72
+ filter_standard(tag, time, record)
71
73
  end
72
- new_es
73
74
  end
74
75
 
75
- def normal_filter_stream(tag, es)
76
- new_es = Fluent::MultiEventStream.new
77
-
78
- es.each do |time,record|
79
- parsed = Woothee.parse(record[@key_name] || '')
80
-
81
- category = parsed[Woothee::ATTRIBUTE_CATEGORY]
82
- next if @mode == :filter and not @filter_categories.include?(category)
83
- next if @mode == :drop and @drop_categories.include?(category)
84
-
85
- if @merge_agent_info
86
- record = record.merge({
87
- @out_key_name => parsed[Woothee::ATTRIBUTE_NAME],
88
- @out_key_category => parsed[Woothee::ATTRIBUTE_CATEGORY].to_s,
89
- @out_key_os => parsed[Woothee::ATTRIBUTE_OS]
90
- })
91
- record[@out_key_os_version] = parsed[Woothee::ATTRIBUTE_OS_VERSION] if @out_key_os_version
92
- record[@out_key_version] = parsed[Woothee::ATTRIBUTE_VERSION] if @out_key_version
93
- record[@out_key_vendor] = parsed[Woothee::ATTRIBUTE_VENDOR] if @out_key_vendor
94
- end
95
- new_es.add(time, record.dup)
76
+ def filter_fast_crawler(tag, time, record)
77
+ if Woothee.is_crawler(record[@key_name] || '')
78
+ nil
79
+ else
80
+ record
96
81
  end
97
- new_es
98
82
  end
99
83
 
100
- def filter_stream(tag, es)
101
- if @fast_crawler_filter_mode
102
- fast_crawler_filter_stream(tag, es)
103
- else
104
- normal_filter_stream(tag, es)
84
+ def filter_standard(tag, time, record)
85
+ parsed = Woothee.parse(record[@key_name] || '')
86
+
87
+ category = parsed[Woothee::ATTRIBUTE_CATEGORY]
88
+ return nil if @mode == :filter && !@filter_categories.include?(category)
89
+ return nil if @mode == :drop && @drop_categories.include?(category)
90
+
91
+ if @merge_agent_info
92
+ record = record.merge({
93
+ @out_key_name => parsed[Woothee::ATTRIBUTE_NAME],
94
+ @out_key_category => parsed[Woothee::ATTRIBUTE_CATEGORY].to_s,
95
+ @out_key_os => parsed[Woothee::ATTRIBUTE_OS]
96
+ })
97
+ record[@out_key_os_version] = parsed[Woothee::ATTRIBUTE_OS_VERSION] if @out_key_os_version
98
+ record[@out_key_version] = parsed[Woothee::ATTRIBUTE_VERSION] if @out_key_version
99
+ record[@out_key_vendor] = parsed[Woothee::ATTRIBUTE_VENDOR] if @out_key_vendor
105
100
  end
101
+ record
106
102
  end
107
- end if defined?(Fluent::Filter)
103
+ end
@@ -22,7 +22,6 @@ unless ENV.has_key?('VERBOSE')
22
22
  $log = nulllogger
23
23
  end
24
24
 
25
- require 'fluent/plugin/out_woothee'
26
25
  require 'fluent/plugin/filter_woothee'
27
26
 
28
27
  class Test::Unit::TestCase
@@ -1,4 +1,5 @@
1
1
  require 'helper'
2
+ require 'fluent/test/driver/filter'
2
3
 
3
4
  class Fluent::WootheeFilterTest < Test::Unit::TestCase
4
5
  # fast crawler filter
@@ -36,13 +37,11 @@ drop_categories crawler,misc
36
37
  ]
37
38
 
38
39
  def setup
39
- omit("Use fluentd v0.12 or later") unless defined?(Fluent::Filter)
40
-
41
40
  Fluent::Test.setup
42
41
  end
43
42
 
44
- def create_driver(conf=CONFIG1,tag='test')
45
- Fluent::Test::FilterTestDriver.new(Fluent::WootheeFilter, tag).configure(conf)
43
+ def create_driver(conf=CONFIG1)
44
+ Fluent::Test::Driver::Filter.new(Fluent::Plugin::WootheeFilter).configure(conf)
46
45
  end
47
46
 
48
47
  class TestConfigure < self
@@ -105,53 +104,51 @@ drop_categories crawler,misc
105
104
  def test_filter_fast_crawler_filter_stream
106
105
  d = create_driver CONFIG0
107
106
  time = Time.parse('2012-07-20 16:19:00').to_i
108
- d.run do
109
- d.filter({'useragent' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', 'value' => 1}, time)
110
- d.filter({'useragent' => 'Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)', 'value' => 2}, time)
111
- d.filter({'useragent' => 'Mozilla/5.0 (iPad; U; CPU OS 4_3_2 like Mac OS X; ja-jp) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5', 'value' => 3}, time)
112
- d.filter({'useragent' => 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)', 'value' => 4}, time)
113
- d.filter({'useragent' => 'Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)', 'value' => 5}, time)
114
- d.filter({'useragent' => 'Mozilla/5.0 (compatible; Rakutenbot/1.0; +http://dynamic.rakuten.co.jp/bot.html)', 'value' => 6}, time)
115
- d.filter({'useragent' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; ja-jp) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1', 'value' => 7}, time)
116
- d.filter({'useragent' => 'Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)', 'value' => 8}, time)
107
+ d.run(default_tag: 'test') do
108
+ d.feed(time, {'useragent' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', 'value' => 1})
109
+ d.feed(time, {'useragent' => 'Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)', 'value' => 2})
110
+ d.feed(time, {'useragent' => 'Mozilla/5.0 (iPad; U; CPU OS 4_3_2 like Mac OS X; ja-jp) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5', 'value' => 3})
111
+ d.feed(time, {'useragent' => 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)', 'value' => 4})
112
+ d.feed(time, {'useragent' => 'Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)', 'value' => 5})
113
+ d.feed(time, {'useragent' => 'Mozilla/5.0 (compatible; Rakutenbot/1.0; +http://dynamic.rakuten.co.jp/bot.html)', 'value' => 6})
114
+ d.feed(time, {'useragent' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; ja-jp) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1', 'value' => 7})
115
+ d.feed(time, {'useragent' => 'Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)', 'value' => 8})
117
116
  end
118
117
 
119
- filtered = d.filtered_as_array
118
+ filtered = d.filtered
120
119
  assert_equal 4, filtered.size
121
120
 
122
- assert_equal 'test', filtered[0][0]
123
- assert_equal time, filtered[0][1]
124
- assert_equal 'Mozilla/5.0 (iPad; U; CPU OS 4_3_2 like Mac OS X; ja-jp) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5', filtered[0][2]['useragent']
125
- assert_equal 3, filtered[0][2]['value']
126
- assert_equal 2, filtered[0][2].keys.size
121
+ assert_equal time, filtered[0][0]
122
+ assert_equal 'Mozilla/5.0 (iPad; U; CPU OS 4_3_2 like Mac OS X; ja-jp) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5', filtered[0][1]['useragent']
123
+ assert_equal 3, filtered[0][1]['value']
124
+ assert_equal 2, filtered[0][1].keys.size
127
125
 
128
- assert_equal 4, filtered[1][2]['value']
129
- assert_equal 6, filtered[2][2]['value']
130
- assert_equal 7, filtered[3][2]['value']
126
+ assert_equal 4, filtered[1][1]['value']
127
+ assert_equal 6, filtered[2][1]['value']
128
+ assert_equal 7, filtered[3][1]['value']
131
129
  end
132
130
 
133
131
  # through & merge
134
132
  def test_filter_through
135
- d = create_driver(CONFIG1, 'test.message')
133
+ d = create_driver(CONFIG1)
136
134
  time = Time.parse('2012-07-20 16:40:30').to_i
137
- d.run do
138
- d.filter({'value' => 0, 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'}, time)
139
- d.filter({'value' => 1, 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
140
- d.filter({'value' => 2, 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
141
- d.filter({'value' => 3, 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'}, time)
142
- d.filter({'value' => 4, 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'}, time)
143
- d.filter({'value' => 5, 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'}, time)
144
- d.filter({'value' => 6, 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'}, time)
145
- d.filter({'value' => 7, 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'}, time)
135
+ d.run(default_tag: 'test.message') do
136
+ d.feed(time, {'value' => 0, 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'})
137
+ d.feed(time, {'value' => 1, 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'})
138
+ d.feed(time, {'value' => 2, 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'})
139
+ d.feed(time, {'value' => 3, 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'})
140
+ d.feed(time, {'value' => 4, 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'})
141
+ d.feed(time, {'value' => 5, 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'})
142
+ d.feed(time, {'value' => 6, 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'})
143
+ d.feed(time, {'value' => 7, 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'})
146
144
  end
147
145
 
148
- filtered = d.filtered_as_array
146
+ filtered = d.filtered
149
147
  assert_equal 8, filtered.size
150
- assert_equal 'test.message', filtered[0][0]
151
- assert_equal time, filtered[0][1]
148
+ assert_equal time, filtered[0][0]
152
149
 
153
150
  # 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'
154
- m = filtered[0][2]
151
+ m = filtered[0][1]
155
152
  assert_equal 0, m['value']
156
153
  assert_equal 'Internet Explorer', m['agent_name']
157
154
  assert_equal 'pc', m['agent_category']
@@ -159,49 +156,49 @@ drop_categories crawler,misc
159
156
  assert_equal 5, m.keys.size
160
157
 
161
158
  # 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
162
- m = filtered[1][2]
159
+ m = filtered[1][1]
163
160
  assert_equal 1, m['value']
164
161
  assert_equal 'Firefox', m['agent_name']
165
162
  assert_equal 'pc', m['agent_category']
166
163
  assert_equal 'Windows Vista', m['agent_os']
167
164
 
168
165
  # 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
169
- m = filtered[2][2]
166
+ m = filtered[2][1]
170
167
  assert_equal 2, m['value']
171
168
  assert_equal 'Firefox', m['agent_name']
172
169
  assert_equal 'pc', m['agent_category']
173
170
  assert_equal 'Linux', m['agent_os']
174
171
 
175
172
  # 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'
176
- m = filtered[3][2]
173
+ m = filtered[3][1]
177
174
  assert_equal 3, m['value']
178
175
  assert_equal 'Safari', m['agent_name']
179
176
  assert_equal 'smartphone', m['agent_category']
180
177
  assert_equal 'Android', m['agent_os']
181
178
 
182
179
  # 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'
183
- m = filtered[4][2]
180
+ m = filtered[4][1]
184
181
  assert_equal 4, m['value']
185
182
  assert_equal 'docomo', m['agent_name']
186
183
  assert_equal 'mobilephone', m['agent_category']
187
184
  assert_equal 'docomo', m['agent_os']
188
185
 
189
186
  # 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'
190
- m = filtered[5][2]
187
+ m = filtered[5][1]
191
188
  assert_equal 5, m['value']
192
189
  assert_equal 'PlayStation Vita', m['agent_name']
193
190
  assert_equal 'appliance', m['agent_category']
194
191
  assert_equal 'PlayStation Vita', m['agent_os']
195
192
 
196
193
  # 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'
197
- m = filtered[6][2]
194
+ m = filtered[6][1]
198
195
  assert_equal 6, m['value']
199
196
  assert_equal 'Google Desktop', m['agent_name']
200
197
  assert_equal 'misc', m['agent_category']
201
198
  assert_equal 'UNKNOWN', m['agent_os']
202
199
 
203
200
  # 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'
204
- m = filtered[7][2]
201
+ m = filtered[7][1]
205
202
  assert_equal 7, m['value']
206
203
  assert_equal 'msnbot', m['agent_name']
207
204
  assert_equal 'crawler', m['agent_category']
@@ -210,26 +207,25 @@ drop_categories crawler,misc
210
207
 
211
208
  # filter & merge
212
209
  def test_filter_stream
213
- d = create_driver(CONFIG2, 'test.message')
210
+ d = create_driver(CONFIG2)
214
211
  time = Time.parse('2012-07-20 16:40:30').to_i
215
- d.run do
216
- d.filter({'value' => 0, 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'}, time)
217
- d.filter({'value' => 1, 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
218
- d.filter({'value' => 2, 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
219
- d.filter({'value' => 3, 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'}, time)
220
- d.filter({'value' => 4, 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'}, time)
221
- d.filter({'value' => 5, 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'}, time)
222
- d.filter({'value' => 6, 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'}, time)
223
- d.filter({'value' => 7, 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'}, time)
212
+ d.run(default_tag: 'test.message') do
213
+ d.feed(time, {'value' => 0, 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'})
214
+ d.feed(time, {'value' => 1, 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'})
215
+ d.feed(time, {'value' => 2, 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'})
216
+ d.feed(time, {'value' => 3, 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'})
217
+ d.feed(time, {'value' => 4, 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'})
218
+ d.feed(time, {'value' => 5, 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'})
219
+ d.feed(time, {'value' => 6, 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'})
220
+ d.feed(time, {'value' => 7, 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'})
224
221
  end
225
222
 
226
- filtered = d.filtered_as_array
223
+ filtered = d.filtered
227
224
  assert_equal 6, filtered.size
228
- assert_equal 'test.message', filtered[0][0]
229
- assert_equal time, filtered[0][1]
225
+ assert_equal time, filtered[0][0]
230
226
 
231
227
  # 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'
232
- m = filtered[0][2]
228
+ m = filtered[0][1]
233
229
  assert_equal 8, m.keys.size
234
230
  assert_equal 0, m['value']
235
231
  assert_equal 'Internet Explorer', m['ua_name']
@@ -240,7 +236,7 @@ drop_categories crawler,misc
240
236
  assert_equal '10.0', m['ua_version']
241
237
 
242
238
  # 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
243
- m = filtered[1][2]
239
+ m = filtered[1][1]
244
240
  assert_equal 1, m['value']
245
241
  assert_equal 'Firefox', m['ua_name']
246
242
  assert_equal 'pc', m['ua_category']
@@ -250,7 +246,7 @@ drop_categories crawler,misc
250
246
  assert_equal '9.0.1', m['ua_version']
251
247
 
252
248
  # 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
253
- m = filtered[2][2]
249
+ m = filtered[2][1]
254
250
  assert_equal 2, m['value']
255
251
  assert_equal 'Firefox', m['ua_name']
256
252
  assert_equal 'pc', m['ua_category']
@@ -260,7 +256,7 @@ drop_categories crawler,misc
260
256
  assert_equal '9.0.1', m['ua_version']
261
257
 
262
258
  # 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'
263
- m = filtered[3][2]
259
+ m = filtered[3][1]
264
260
  assert_equal 3, m['value']
265
261
  assert_equal 'Safari', m['ua_name']
266
262
  assert_equal 'smartphone', m['ua_category']
@@ -270,7 +266,7 @@ drop_categories crawler,misc
270
266
  assert_equal '4.0', m['ua_version']
271
267
 
272
268
  # 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'
273
- m = filtered[4][2]
269
+ m = filtered[4][1]
274
270
  assert_equal 4, m['value']
275
271
  assert_equal 'docomo', m['ua_name']
276
272
  assert_equal 'mobilephone', m['ua_category']
@@ -280,7 +276,7 @@ drop_categories crawler,misc
280
276
  assert_equal 'N505i', m['ua_version']
281
277
 
282
278
  # 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'
283
- m = filtered[5][2]
279
+ m = filtered[5][1]
284
280
  assert_equal 5, m['value']
285
281
  assert_equal 'PlayStation Vita', m['ua_name']
286
282
  assert_equal 'appliance', m['ua_category']
@@ -292,47 +288,46 @@ drop_categories crawler,misc
292
288
 
293
289
  # drop & non-merge
294
290
  def test_filter_drop
295
- d = create_driver(CONFIG3, 'test.message')
291
+ d = create_driver(CONFIG3)
296
292
  time = Time.parse('2012-07-20 16:40:30').to_i
297
- d.run do
298
- d.filter({'value' => 0, 'user_agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'}, time)
299
- d.filter({'value' => 1, 'user_agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
300
- d.filter({'value' => 2, 'user_agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
301
- d.filter({'value' => 3, 'user_agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'}, time)
302
- d.filter({'value' => 4, 'user_agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'}, time)
303
- d.filter({'value' => 5, 'user_agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'}, time)
304
- d.filter({'value' => 6, 'user_agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'}, time)
305
- d.filter({'value' => 7, 'user_agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'}, time)
293
+ d.run(default_tag: 'test.message') do
294
+ d.feed(time, {'value' => 0, 'user_agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'})
295
+ d.feed(time, {'value' => 1, 'user_agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'})
296
+ d.feed(time, {'value' => 2, 'user_agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'})
297
+ d.feed(time, {'value' => 3, 'user_agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'})
298
+ d.feed(time, {'value' => 4, 'user_agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'})
299
+ d.feed(time, {'value' => 5, 'user_agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'})
300
+ d.feed(time, {'value' => 6, 'user_agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'})
301
+ d.feed(time, {'value' => 7, 'user_agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'})
306
302
  end
307
303
 
308
- filtered = d.filtered_as_array
304
+ filtered = d.filtered
309
305
  assert_equal 6, filtered.size
310
- assert_equal 'test.message', filtered[0][0]
311
- assert_equal time, filtered[0][1]
306
+ assert_equal time, filtered[0][0]
312
307
 
313
308
  # 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'
314
- m = filtered[0][2]
309
+ m = filtered[0][1]
315
310
  assert_equal 0, m['value']
316
311
  assert_equal 2, m.keys.size
317
312
 
318
313
  # 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
319
- m = filtered[1][2]
314
+ m = filtered[1][1]
320
315
  assert_equal 1, m['value']
321
316
 
322
317
  # 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
323
- m = filtered[2][2]
318
+ m = filtered[2][1]
324
319
  assert_equal 2, m['value']
325
320
 
326
321
  # 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'
327
- m = filtered[3][2]
322
+ m = filtered[3][1]
328
323
  assert_equal 3, m['value']
329
324
 
330
325
  # 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'
331
- m = filtered[4][2]
326
+ m = filtered[4][1]
332
327
  assert_equal 4, m['value']
333
328
 
334
329
  # 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'
335
- m = filtered[5][2]
330
+ m = filtered[5][1]
336
331
  assert_equal 5, m['value']
337
332
  end
338
333
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fluent-plugin-woothee
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.2
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - TAGOMORI Satoshi
@@ -42,14 +42,14 @@ dependencies:
42
42
  name: fluentd
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
- - - "<"
45
+ - - ">="
46
46
  - !ruby/object:Gem::Version
47
47
  version: 0.14.0
48
48
  type: :runtime
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - "<"
52
+ - - ">="
53
53
  - !ruby/object:Gem::Version
54
54
  version: 0.14.0
55
55
  - !ruby/object:Gem::Dependency
@@ -81,10 +81,8 @@ files:
81
81
  - Rakefile
82
82
  - fluent-plugin-woothee.gemspec
83
83
  - lib/fluent/plugin/filter_woothee.rb
84
- - lib/fluent/plugin/out_woothee.rb
85
84
  - test/helper.rb
86
85
  - test/plugin/test_filter_woothee.rb
87
- - test/plugin/test_out_woothee.rb
88
86
  homepage: https://github.com/woothee/fluent-plugin-woothee
89
87
  licenses:
90
88
  - Apache-2.0
@@ -113,4 +111,3 @@ summary: Fluentd plugin to parse UserAgent strings with woothee parser. It adds
113
111
  test_files:
114
112
  - test/helper.rb
115
113
  - test/plugin/test_filter_woothee.rb
116
- - test/plugin/test_out_woothee.rb
@@ -1,154 +0,0 @@
1
- class Fluent::WootheeOutput < Fluent::Output
2
- Fluent::Plugin.register_output('woothee', self)
3
- Fluent::Plugin.register_output('woothee_fast_crawler_filter', self)
4
-
5
- # Define `router` method of v0.12 to support v0.10 or earlier
6
- unless method_defined?(:router)
7
- define_method("router") { Fluent::Engine }
8
- end
9
-
10
- # Define `log` method for v0.10.42 or earlier
11
- unless method_defined?(:log)
12
- define_method("log") { $log }
13
- end
14
-
15
- config_param :tag, :string, :default => nil
16
- config_param :remove_prefix, :string, :default => nil
17
- config_param :add_prefix, :string, :default => nil
18
-
19
- config_param :fast_crawler_filter_mode, :bool, :default => false
20
-
21
- config_param :key_name, :string
22
-
23
- config_param :filter_categories, :default => [] do |val|
24
- val.split(',').map(&:to_sym)
25
- end
26
- config_param :drop_categories, :default => [] do |val|
27
- val.split(',').map(&:to_sym)
28
- end
29
- attr_accessor :mode
30
-
31
- config_param :merge_agent_info, :bool, :default => false
32
- config_param :out_key_name, :string, :default => 'agent_name'
33
- config_param :out_key_category, :string, :default => 'agent_category'
34
- config_param :out_key_os, :string, :default => 'agent_os'
35
- config_param :out_key_os_version, :string, :default => nil # supress output
36
- config_param :out_key_version, :string, :default => nil # supress output
37
- config_param :out_key_vendor, :string, :default => nil # supress output
38
-
39
- def initialize
40
- super
41
- require 'woothee'
42
- end
43
-
44
- def configure(conf)
45
- super
46
-
47
- # tag ->
48
- if not @tag and not @remove_prefix and not @add_prefix
49
- raise Fluent::ConfigError, "missing both of remove_prefix and add_prefix"
50
- end
51
- if @tag and (@remove_prefix or @add_prefix)
52
- raise Fluent::ConfigError, "both of tag and remove_prefix/add_prefix must not be specified"
53
- end
54
- if @remove_prefix
55
- @removed_prefix_string = @remove_prefix + '.'
56
- @removed_length = @removed_prefix_string.length
57
- end
58
- if @add_prefix
59
- @added_prefix_string = @add_prefix + '.'
60
- end
61
- # <- tag
62
-
63
- if conf['type'] == 'woothee_fast_crawler_filter' or @fast_crawler_filter_mode
64
- @fast_crawler_filter_mode = true
65
-
66
- if @filter_categories.size > 0 or @drop_categories.size > 0 or @merge_agent_info
67
- raise Fluent::ConfigError, "fast_crawler_filter cannot be specified with filter/drop/merge options"
68
- end
69
-
70
- return
71
- end
72
-
73
- if @filter_categories.size > 0 and @drop_categories.size > 0
74
- raise Fluent::ConfigError, "both of 'filter' and 'drop' categories specified"
75
- elsif @filter_categories.size > 0
76
- unless @filter_categories.reduce(true){|r,i| r and Woothee::CATEGORY_LIST.include?(i)}
77
- raise Fluent::ConfigError, "filter_categories has invalid category name"
78
- end
79
- @mode = :filter
80
- elsif @drop_categories.size > 0
81
- unless @drop_categories.reduce(true){|r,i| r and Woothee::CATEGORY_LIST.include?(i)}
82
- raise Fluent::ConfigError, "drop_categories has invalid category name"
83
- end
84
- @mode = :drop
85
- else
86
- @mode = :through
87
- end
88
-
89
- if @mode == :through and not @merge_agent_info
90
- raise Fluent::ConfigError, "configured not to do nothing (not to do either filter/drop nor addition of parser result)"
91
- end
92
- end
93
-
94
- def tag_mangle(tag)
95
- if @tag
96
- @tag
97
- else
98
- if @remove_prefix and
99
- ( (tag.start_with?(@removed_prefix_string) and tag.length > @removed_length) or tag == @remove_prefix)
100
- tag = tag[@removed_length..-1]
101
- end
102
- if @add_prefix
103
- tag = if tag and tag.length > 0
104
- @added_prefix_string + tag
105
- else
106
- @add_prefix
107
- end
108
- end
109
- tag
110
- end
111
- end
112
-
113
- def fast_crawler_filter_emit(tag, es)
114
- es.each do |time,record|
115
- unless Woothee.is_crawler(record[@key_name] || '')
116
- router.emit(tag, time, record)
117
- end
118
- end
119
- end
120
-
121
- def normal_emit(tag, es)
122
- es.each do |time,record|
123
- parsed = Woothee.parse(record[@key_name] || '')
124
-
125
- category = parsed[Woothee::ATTRIBUTE_CATEGORY]
126
- next if @mode == :filter and not @filter_categories.include?(category)
127
- next if @mode == :drop and @drop_categories.include?(category)
128
-
129
- if @merge_agent_info
130
- record = record.merge({
131
- @out_key_name => parsed[Woothee::ATTRIBUTE_NAME],
132
- @out_key_category => parsed[Woothee::ATTRIBUTE_CATEGORY].to_s,
133
- @out_key_os => parsed[Woothee::ATTRIBUTE_OS]
134
- })
135
- record[@out_key_os_version] = parsed[Woothee::ATTRIBUTE_OS_VERSION] if @out_key_os_version
136
- record[@out_key_version] = parsed[Woothee::ATTRIBUTE_VERSION] if @out_key_version
137
- record[@out_key_vendor] = parsed[Woothee::ATTRIBUTE_VENDOR] if @out_key_vendor
138
- end
139
- router.emit(tag, time, record)
140
- end
141
- end
142
-
143
- def emit(tag, es, chain)
144
- tag = tag_mangle(tag)
145
-
146
- if @fast_crawler_filter_mode
147
- fast_crawler_filter_emit(tag, es)
148
- else
149
- normal_emit(tag, es)
150
- end
151
-
152
- chain.next
153
- end
154
- end
@@ -1,364 +0,0 @@
1
- require 'helper'
2
-
3
- class Fluent::WootheeOutputTest < Test::Unit::TestCase
4
- # fast crawler filter
5
- CONFIG0 = %[
6
- type woothee_fast_crawler_filter
7
- key_name useragent
8
- tag filtered
9
- ]
10
-
11
- # through & merge
12
- CONFIG1 = %[
13
- type woothee
14
- key_name agent
15
- remove_prefix test
16
- add_prefix merged
17
- merge_agent_info yes
18
- ]
19
-
20
- # filter & merge
21
- CONFIG2 = %[
22
- type woothee
23
- key_name agent
24
- filter_categories pc,smartphone,mobilephone,appliance
25
- remove_prefix test
26
- add_prefix merged
27
- merge_agent_info yes
28
- out_key_name ua_name
29
- out_key_category ua_category
30
- out_key_os ua_os
31
- out_key_os_version ua_os_version
32
- out_key_version ua_version
33
- out_key_vendor ua_vendor
34
- ]
35
-
36
- # drop & non-merge
37
- CONFIG3 = %[
38
- type woothee
39
- key_name user_agent
40
- drop_categories crawler,misc
41
- tag selected
42
- ]
43
-
44
- def setup
45
- Fluent::Test.setup
46
- end
47
-
48
- def create_driver(conf=CONFIG1,tag='test')
49
- Fluent::Test::OutputTestDriver.new(Fluent::WootheeOutput, tag).configure(conf)
50
- end
51
-
52
- def test_configure
53
- # fast_crawler_filter
54
- d = create_driver CONFIG0
55
- assert_equal true, d.instance.fast_crawler_filter_mode
56
- assert_equal 'useragent', d.instance.key_name
57
- assert_equal 'filtered', d.instance.tag
58
-
59
- # through & merge
60
- d = create_driver CONFIG1
61
- assert_equal false, d.instance.fast_crawler_filter_mode
62
- assert_equal 'agent', d.instance.key_name
63
- assert_equal 'test', d.instance.remove_prefix
64
- assert_equal 'merged', d.instance.add_prefix
65
-
66
- assert_equal 0, d.instance.filter_categories.size
67
- assert_equal 0, d.instance.drop_categories.size
68
- assert_equal :through, d.instance.mode
69
-
70
- assert_equal true, d.instance.merge_agent_info
71
- assert_equal 'agent_name', d.instance.out_key_name
72
- assert_equal 'agent_category', d.instance.out_key_category
73
- assert_equal 'agent_os', d.instance.out_key_os
74
- assert_nil d.instance.out_key_version
75
- assert_nil d.instance.out_key_vendor
76
-
77
- # filter & merge
78
- d = create_driver CONFIG2
79
- assert_equal false, d.instance.fast_crawler_filter_mode
80
- assert_equal 'agent', d.instance.key_name
81
- assert_equal 'test', d.instance.remove_prefix
82
- assert_equal 'merged', d.instance.add_prefix
83
-
84
- assert_equal 4, d.instance.filter_categories.size
85
- assert_equal [:pc,:smartphone,:mobilephone,:appliance], d.instance.filter_categories
86
- assert_equal 0, d.instance.drop_categories.size
87
- assert_equal :filter, d.instance.mode
88
-
89
- assert_equal true, d.instance.merge_agent_info
90
- assert_equal 'ua_name', d.instance.out_key_name
91
- assert_equal 'ua_category', d.instance.out_key_category
92
- assert_equal 'ua_os', d.instance.out_key_os
93
- assert_equal 'ua_os_version', d.instance.out_key_os_version
94
- assert_equal 'ua_version', d.instance.out_key_version
95
- assert_equal 'ua_vendor', d.instance.out_key_vendor
96
-
97
- # drop & non-merge
98
- d = create_driver CONFIG3
99
- assert_equal false, d.instance.fast_crawler_filter_mode
100
- assert_equal 'user_agent', d.instance.key_name
101
- assert_equal 'selected', d.instance.tag
102
-
103
- assert_equal 0, d.instance.filter_categories.size
104
- assert_equal 2, d.instance.drop_categories.size
105
- assert_equal [:crawler,:misc], d.instance.drop_categories
106
- assert_equal :drop, d.instance.mode
107
-
108
- assert_equal false, d.instance.merge_agent_info
109
- end
110
-
111
- def test_tag_mangle
112
- p = create_driver(CONFIG0).instance
113
- assert_equal 'filtered', p.tag_mangle('data')
114
- assert_equal 'filtered', p.tag_mangle('test.data')
115
- assert_equal 'filtered', p.tag_mangle('test.test.data')
116
- assert_equal 'filtered', p.tag_mangle('test')
117
-
118
- p = create_driver(CONFIG1).instance
119
- assert_equal 'merged.data', p.tag_mangle('data')
120
- assert_equal 'merged.data', p.tag_mangle('test.data')
121
- assert_equal 'merged.test.data', p.tag_mangle('test.test.data')
122
- assert_equal 'merged', p.tag_mangle('test')
123
-
124
- p = create_driver(CONFIG3).instance
125
- assert_equal 'selected', p.tag_mangle('data')
126
- assert_equal 'selected', p.tag_mangle('test.data')
127
- assert_equal 'selected', p.tag_mangle('test.test.data')
128
- assert_equal 'selected', p.tag_mangle('test')
129
- end
130
-
131
- def test_emit_fast_crawler_filter
132
- d = create_driver CONFIG0
133
- time = Time.parse('2012-07-20 16:19:00').to_i
134
- d.run do
135
- d.emit({'useragent' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', 'value' => 1}, time)
136
- d.emit({'useragent' => 'Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)', 'value' => 2}, time)
137
- d.emit({'useragent' => 'Mozilla/5.0 (iPad; U; CPU OS 4_3_2 like Mac OS X; ja-jp) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5', 'value' => 3}, time)
138
- d.emit({'useragent' => 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)', 'value' => 4}, time)
139
- d.emit({'useragent' => 'Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)', 'value' => 5}, time)
140
- d.emit({'useragent' => 'Mozilla/5.0 (compatible; Rakutenbot/1.0; +http://dynamic.rakuten.co.jp/bot.html)', 'value' => 6}, time)
141
- d.emit({'useragent' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; ja-jp) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1', 'value' => 7}, time)
142
- d.emit({'useragent' => 'Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)', 'value' => 8}, time)
143
- end
144
-
145
- emits = d.emits
146
- assert_equal 4, emits.size
147
-
148
- assert_equal 'filtered', emits[0][0]
149
- assert_equal time, emits[0][1]
150
- assert_equal 'Mozilla/5.0 (iPad; U; CPU OS 4_3_2 like Mac OS X; ja-jp) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5', emits[0][2]['useragent']
151
- assert_equal 3, emits[0][2]['value']
152
- assert_equal 2, emits[0][2].keys.size
153
-
154
- assert_equal 4, emits[1][2]['value']
155
- assert_equal 6, emits[2][2]['value']
156
- assert_equal 7, emits[3][2]['value']
157
- end
158
-
159
- # # through & merge
160
- def test_emit_through
161
- d = create_driver(CONFIG1, 'test.message')
162
- time = Time.parse('2012-07-20 16:40:30').to_i
163
- d.run do
164
- d.emit({'value' => 0, 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'}, time)
165
- d.emit({'value' => 1, 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
166
- d.emit({'value' => 2, 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
167
- d.emit({'value' => 3, 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'}, time)
168
- d.emit({'value' => 4, 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'}, time)
169
- d.emit({'value' => 5, 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'}, time)
170
- d.emit({'value' => 6, 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'}, time)
171
- d.emit({'value' => 7, 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'}, time)
172
- end
173
-
174
- emits = d.emits
175
- assert_equal 8, emits.size
176
- assert_equal 'merged.message', emits[0][0]
177
- assert_equal time, emits[0][1]
178
-
179
- # 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'
180
- m = emits[0][2]
181
- assert_equal 0, m['value']
182
- assert_equal 'Internet Explorer', m['agent_name']
183
- assert_equal 'pc', m['agent_category']
184
- assert_equal 'Windows 8', m['agent_os']
185
- assert_equal 5, m.keys.size
186
-
187
- # 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
188
- m = emits[1][2]
189
- assert_equal 1, m['value']
190
- assert_equal 'Firefox', m['agent_name']
191
- assert_equal 'pc', m['agent_category']
192
- assert_equal 'Windows Vista', m['agent_os']
193
-
194
- # 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
195
- m = emits[2][2]
196
- assert_equal 2, m['value']
197
- assert_equal 'Firefox', m['agent_name']
198
- assert_equal 'pc', m['agent_category']
199
- assert_equal 'Linux', m['agent_os']
200
-
201
- # 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'
202
- m = emits[3][2]
203
- assert_equal 3, m['value']
204
- assert_equal 'Safari', m['agent_name']
205
- assert_equal 'smartphone', m['agent_category']
206
- assert_equal 'Android', m['agent_os']
207
-
208
- # 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'
209
- m = emits[4][2]
210
- assert_equal 4, m['value']
211
- assert_equal 'docomo', m['agent_name']
212
- assert_equal 'mobilephone', m['agent_category']
213
- assert_equal 'docomo', m['agent_os']
214
-
215
- # 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'
216
- m = emits[5][2]
217
- assert_equal 5, m['value']
218
- assert_equal 'PlayStation Vita', m['agent_name']
219
- assert_equal 'appliance', m['agent_category']
220
- assert_equal 'PlayStation Vita', m['agent_os']
221
-
222
- # 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'
223
- m = emits[6][2]
224
- assert_equal 6, m['value']
225
- assert_equal 'Google Desktop', m['agent_name']
226
- assert_equal 'misc', m['agent_category']
227
- assert_equal 'UNKNOWN', m['agent_os']
228
-
229
- # 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'
230
- m = emits[7][2]
231
- assert_equal 7, m['value']
232
- assert_equal 'msnbot', m['agent_name']
233
- assert_equal 'crawler', m['agent_category']
234
- assert_equal 'UNKNOWN', m['agent_os']
235
- end
236
-
237
- # # filter & merge
238
- def test_emit_filter
239
- d = create_driver(CONFIG2, 'test.message')
240
- time = Time.parse('2012-07-20 16:40:30').to_i
241
- d.run do
242
- d.emit({'value' => 0, 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'}, time)
243
- d.emit({'value' => 1, 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
244
- d.emit({'value' => 2, 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
245
- d.emit({'value' => 3, 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'}, time)
246
- d.emit({'value' => 4, 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'}, time)
247
- d.emit({'value' => 5, 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'}, time)
248
- d.emit({'value' => 6, 'agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'}, time)
249
- d.emit({'value' => 7, 'agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'}, time)
250
- end
251
-
252
- emits = d.emits
253
- assert_equal 6, emits.size
254
- assert_equal 'merged.message', emits[0][0]
255
- assert_equal time, emits[0][1]
256
-
257
- # 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'
258
- m = emits[0][2]
259
- assert_equal 8, m.keys.size
260
- assert_equal 0, m['value']
261
- assert_equal 'Internet Explorer', m['ua_name']
262
- assert_equal 'pc', m['ua_category']
263
- assert_equal 'Windows 8', m['ua_os']
264
- assert_equal 'NT 6.2', m['ua_os_version']
265
- assert_equal 'Microsoft', m['ua_vendor']
266
- assert_equal '10.0', m['ua_version']
267
-
268
- # 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
269
- m = emits[1][2]
270
- assert_equal 1, m['value']
271
- assert_equal 'Firefox', m['ua_name']
272
- assert_equal 'pc', m['ua_category']
273
- assert_equal 'Windows Vista', m['ua_os']
274
- assert_equal 'NT 6.0', m['ua_os_version']
275
- assert_equal 'Mozilla', m['ua_vendor']
276
- assert_equal '9.0.1', m['ua_version']
277
-
278
- # 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
279
- m = emits[2][2]
280
- assert_equal 2, m['value']
281
- assert_equal 'Firefox', m['ua_name']
282
- assert_equal 'pc', m['ua_category']
283
- assert_equal 'Linux', m['ua_os']
284
- assert_equal 'UNKNOWN', m['ua_os_version']
285
- assert_equal 'Mozilla', m['ua_vendor']
286
- assert_equal '9.0.1', m['ua_version']
287
-
288
- # 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'
289
- m = emits[3][2]
290
- assert_equal 3, m['value']
291
- assert_equal 'Safari', m['ua_name']
292
- assert_equal 'smartphone', m['ua_category']
293
- assert_equal 'Android', m['ua_os']
294
- assert_equal '3.1', m['ua_os_version']
295
- assert_equal 'Apple', m['ua_vendor']
296
- assert_equal '4.0', m['ua_version']
297
-
298
- # 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'
299
- m = emits[4][2]
300
- assert_equal 4, m['value']
301
- assert_equal 'docomo', m['ua_name']
302
- assert_equal 'mobilephone', m['ua_category']
303
- assert_equal 'docomo', m['ua_os']
304
- assert_equal 'UNKNOWN', m['ua_os_version']
305
- assert_equal 'docomo', m['ua_vendor']
306
- assert_equal 'N505i', m['ua_version']
307
-
308
- # 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'
309
- m = emits[5][2]
310
- assert_equal 5, m['value']
311
- assert_equal 'PlayStation Vita', m['ua_name']
312
- assert_equal 'appliance', m['ua_category']
313
- assert_equal 'PlayStation Vita', m['ua_os']
314
- assert_equal '1.51', m['ua_os_version']
315
- assert_equal 'Sony', m['ua_vendor']
316
- assert_equal 'UNKNOWN', m['ua_version']
317
- end
318
-
319
- # # drop & non-merge
320
- def test_emit_drop
321
- d = create_driver(CONFIG3, 'test.message')
322
- time = Time.parse('2012-07-20 16:40:30').to_i
323
- d.run do
324
- d.emit({'value' => 0, 'user_agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'}, time)
325
- d.emit({'value' => 1, 'user_agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
326
- d.emit({'value' => 2, 'user_agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'}, time)
327
- d.emit({'value' => 3, 'user_agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'}, time)
328
- d.emit({'value' => 4, 'user_agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'}, time)
329
- d.emit({'value' => 5, 'user_agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'}, time)
330
- d.emit({'value' => 6, 'user_agent' => 'Mozilla/5.0 (compatible; Google Desktop/5.9.1005.12335; http://desktop.google.com/)'}, time)
331
- d.emit({'value' => 7, 'user_agent' => 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)'}, time)
332
- end
333
-
334
- emits = d.emits
335
- assert_equal 6, emits.size
336
- assert_equal 'selected', emits[0][0]
337
- assert_equal time, emits[0][1]
338
-
339
- # 'agent' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)'
340
- m = emits[0][2]
341
- assert_equal 0, m['value']
342
- assert_equal 2, m.keys.size
343
-
344
- # 'agent' => 'Mozilla/5.0 (Windows NT 6.0; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
345
- m = emits[1][2]
346
- assert_equal 1, m['value']
347
-
348
- # 'agent' => 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1'
349
- m = emits[2][2]
350
- assert_equal 2, m['value']
351
-
352
- # 'agent' => 'Mozilla/5.0 (Linux; U; Android 3.1; ja-jp; L-06C Build/HMJ37) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13'
353
- m = emits[3][2]
354
- assert_equal 3, m['value']
355
-
356
- # 'agent' => 'DoCoMo/1.0/N505i/c20/TB/W24H12'
357
- m = emits[4][2]
358
- assert_equal 4, m['value']
359
-
360
- # 'agent' => 'Mozilla/5.0 (PlayStation Vita 1.51) AppleWebKit/531.22.8 (KHTML, like Gecko) Silk/3.2'
361
- m = emits[5][2]
362
- assert_equal 5, m['value']
363
- end
364
- end