fluent-plugin-prometheus 1.5.0 → 1.6.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.travis.yml +3 -3
- data/README.md +31 -17
- data/fluent-plugin-prometheus.gemspec +1 -1
- data/lib/fluent/plugin/filter_prometheus.rb +2 -1
- data/lib/fluent/plugin/in_prometheus.rb +50 -0
- data/lib/fluent/plugin/in_prometheus_monitor.rb +17 -2
- data/lib/fluent/plugin/in_prometheus_output_monitor.rb +37 -6
- data/lib/fluent/plugin/in_prometheus_tail_monitor.rb +2 -1
- data/lib/fluent/plugin/out_prometheus.rb +2 -1
- data/lib/fluent/plugin/prometheus.rb +25 -0
- data/lib/fluent/plugin/prometheus_metrics.rb +77 -0
- data/misc/prometheus_alerts.yaml +1 -1
- data/spec/fluent/plugin/prometheus_metrics_spec.rb +138 -0
- metadata +7 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3f079822fa7231b77912e5ac1324b691f38dbf1f1f32fd8700dee19bc636fd8c
|
4
|
+
data.tar.gz: 58ad623a333cc1600e971fdd35b9b2c7d4a381c03a954d99206233758e4562f6
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 4394983e398e9e4b116f54927b7ad82778ab672b27e41bfc1ef3eb5421f3f3cde7899ce906d86de8862179c0293c1831c0a555553f21c10787dddba22c68fe1e
|
7
|
+
data.tar.gz: f8d52d61a493f4c8abfc88a1cfdc415779a4946eeed639b03545574b2a86304d7a560c026391897ab93596280be306a02c24d3dcdbc161c30f625c211d237711
|
data/.travis.yml
CHANGED
data/README.md
CHANGED
@@ -58,18 +58,25 @@ More configuration parameters:
|
|
58
58
|
- `bind`: binding interface (default: '0.0.0.0')
|
59
59
|
- `port`: listen port (default: 24231)
|
60
60
|
- `metrics_path`: metrics HTTP endpoint (default: /metrics)
|
61
|
+
- `aggregated_metrics_path`: metrics HTTP endpoint (default: /aggregated_metrics)
|
61
62
|
|
62
63
|
When using multiple workers, each worker binds to port + `fluent_worker_id`.
|
64
|
+
To scrape metrics from all workers at once, you can access http://localhost:24231/aggregated_metrics.
|
63
65
|
|
64
66
|
### prometheus_monitor input plugin
|
65
67
|
|
66
68
|
This plugin collects internal metrics in Fluentd. The metrics are similar to/part of [monitor_agent](https://docs.fluentd.org/input/monitor_agent).
|
67
69
|
|
68
|
-
Current exposed metrics:
|
69
70
|
|
70
|
-
|
71
|
-
|
72
|
-
- `
|
71
|
+
#### Exposed metrics
|
72
|
+
|
73
|
+
- `fluentd_status_buffer_queue_length`
|
74
|
+
- `fluentd_status_buffer_total_queued_size`
|
75
|
+
- `fluentd_status_retry_count`
|
76
|
+
- `fluentd_status_buffer_newest_timekey` from fluentd v1.4.2
|
77
|
+
- `fluentd_status_buffer_oldest_timekey` from fluentd v1.4.2
|
78
|
+
|
79
|
+
#### Configuration
|
73
80
|
|
74
81
|
With following configuration, those metrics are collected.
|
75
82
|
|
@@ -86,28 +93,35 @@ More configuration parameters:
|
|
86
93
|
|
87
94
|
### prometheus_output_monitor input plugin
|
88
95
|
|
89
|
-
**experimental**
|
90
|
-
|
91
96
|
This plugin collects internal metrics for output plugin in Fluentd. This is similar to `prometheus_monitor` plugin, but specialized for output plugin. There are Many metrics `prometheus_monitor` does not include, such as `num_errors`, `retry_wait` and so on.
|
92
97
|
|
93
|
-
|
98
|
+
#### Exposed metrics
|
99
|
+
|
100
|
+
Metrics for output
|
94
101
|
|
95
|
-
- `fluentd_output_status_buffer_queue_length`
|
96
|
-
- `fluentd_output_status_buffer_total_bytes`
|
97
102
|
- `fluentd_output_status_retry_count`
|
98
103
|
- `fluentd_output_status_num_errors`
|
99
104
|
- `fluentd_output_status_emit_count`
|
100
|
-
- `fluentd_output_status_flush_time_count`
|
101
|
-
- `fluentd_output_status_slow_flush_count`
|
102
105
|
- `fluentd_output_status_retry_wait`
|
103
106
|
- current retry_wait computed from last retry time and next retry time
|
104
107
|
- `fluentd_output_status_emit_records`
|
105
|
-
- only for v0.14
|
106
108
|
- `fluentd_output_status_write_count`
|
107
|
-
- only for v0.14
|
108
109
|
- `fluentd_output_status_rollback_count`
|
109
|
-
|
110
|
+
- `fluentd_output_status_flush_time_count` from fluentd v1.6.0
|
111
|
+
- `fluentd_output_status_slow_flush_count` from fluentd v1.6.0
|
110
112
|
|
113
|
+
Metrics for buffer
|
114
|
+
|
115
|
+
- `fluentd_output_status_buffer_total_bytes`
|
116
|
+
- `fluentd_output_status_buffer_stage_length` from fluentd v1.6.0
|
117
|
+
- `fluentd_output_status_buffer_stage_byte_size` from fluentd v1.6.0
|
118
|
+
- `fluentd_output_status_buffer_queue_length`
|
119
|
+
- `fluentd_output_status_buffer_queue_byte_size` from fluentd v1.6.0
|
120
|
+
- `fluentd_output_status_buffer_newest_timekey` from fluentd v1.6.0
|
121
|
+
- `fluentd_output_status_buffer_oldest_timekey` from fluentd v1.6.0
|
122
|
+
- `fluentd_output_status_buffer_available_space_ratio` from fluentd v1.6.0
|
123
|
+
|
124
|
+
#### Configuration
|
111
125
|
|
112
126
|
With following configuration, those metrics are collected.
|
113
127
|
|
@@ -124,13 +138,11 @@ More configuration parameters:
|
|
124
138
|
|
125
139
|
### prometheus_tail_monitor input plugin
|
126
140
|
|
127
|
-
**experimental**
|
128
|
-
|
129
141
|
This plugin collects internal metrics for in_tail plugin in Fluentd. in_tail plugin holds internal state for files that the plugin is watching. The state is sometimes important to monitor plugins work correctly.
|
130
142
|
|
131
143
|
This plugin uses internal class of Fluentd, so it's easy to break.
|
132
144
|
|
133
|
-
|
145
|
+
#### Exposed metrics
|
134
146
|
|
135
147
|
- `fluentd_tail_file_position`
|
136
148
|
- Current bytes which plugin reads from the file
|
@@ -143,6 +155,8 @@ Default labels:
|
|
143
155
|
- `type`: plugin name. `in_tail` only for now.
|
144
156
|
- `path`: file path
|
145
157
|
|
158
|
+
#### Configuration
|
159
|
+
|
146
160
|
With following configuration, those metrics are collected.
|
147
161
|
|
148
162
|
```
|
@@ -1,6 +1,6 @@
|
|
1
1
|
Gem::Specification.new do |spec|
|
2
2
|
spec.name = "fluent-plugin-prometheus"
|
3
|
-
spec.version = "1.
|
3
|
+
spec.version = "1.6.0"
|
4
4
|
spec.authors = ["Masahiro Sano"]
|
5
5
|
spec.email = ["sabottenda@gmail.com"]
|
6
6
|
spec.summary = %q{A fluent plugin that collects metrics and exposes for Prometheus.}
|
@@ -4,6 +4,7 @@ require 'fluent/plugin/filter'
|
|
4
4
|
module Fluent::Plugin
|
5
5
|
class PrometheusFilter < Fluent::Plugin::Filter
|
6
6
|
Fluent::Plugin.register_filter('prometheus', self)
|
7
|
+
include Fluent::Plugin::PrometheusLabelParser
|
7
8
|
include Fluent::Plugin::Prometheus
|
8
9
|
|
9
10
|
def initialize
|
@@ -17,7 +18,7 @@ module Fluent::Plugin
|
|
17
18
|
|
18
19
|
def configure(conf)
|
19
20
|
super
|
20
|
-
labels =
|
21
|
+
labels = parse_labels_elements(conf)
|
21
22
|
@metrics = Fluent::Plugin::Prometheus.parse_metrics_elements(conf, @registry, labels)
|
22
23
|
end
|
23
24
|
|
@@ -1,5 +1,7 @@
|
|
1
1
|
require 'fluent/plugin/input'
|
2
2
|
require 'fluent/plugin/prometheus'
|
3
|
+
require 'fluent/plugin/prometheus_metrics'
|
4
|
+
require 'net/http'
|
3
5
|
require 'webrick'
|
4
6
|
|
5
7
|
module Fluent::Plugin
|
@@ -11,6 +13,7 @@ module Fluent::Plugin
|
|
11
13
|
config_param :bind, :string, default: '0.0.0.0'
|
12
14
|
config_param :port, :integer, default: 24231
|
13
15
|
config_param :metrics_path, :string, default: '/metrics'
|
16
|
+
config_param :aggregated_metrics_path, :string, default: '/aggregated_metrics'
|
14
17
|
|
15
18
|
desc 'Enable ssl configuration for the server'
|
16
19
|
config_section :ssl, required: false, multi: false do
|
@@ -31,6 +34,10 @@ module Fluent::Plugin
|
|
31
34
|
|
32
35
|
attr_reader :registry
|
33
36
|
|
37
|
+
attr_reader :num_workers
|
38
|
+
attr_reader :base_port
|
39
|
+
attr_reader :metrics_path
|
40
|
+
|
34
41
|
def initialize
|
35
42
|
super
|
36
43
|
@registry = ::Prometheus::Client.registry
|
@@ -38,6 +45,18 @@ module Fluent::Plugin
|
|
38
45
|
|
39
46
|
def configure(conf)
|
40
47
|
super
|
48
|
+
|
49
|
+
# Get how many workers we have
|
50
|
+
sysconf = if self.respond_to?(:owner) && owner.respond_to?(:system_config)
|
51
|
+
owner.system_config
|
52
|
+
elsif self.respond_to?(:system_config)
|
53
|
+
self.system_config
|
54
|
+
else
|
55
|
+
nil
|
56
|
+
end
|
57
|
+
@num_workers = sysconf && sysconf.workers ? sysconf.workers : 1
|
58
|
+
|
59
|
+
@base_port = @port
|
41
60
|
@port += fluentd_worker_id
|
42
61
|
end
|
43
62
|
|
@@ -82,6 +101,7 @@ module Fluent::Plugin
|
|
82
101
|
|
83
102
|
@server = WEBrick::HTTPServer.new(config)
|
84
103
|
@server.mount(@metrics_path, MonitorServlet, self)
|
104
|
+
@server.mount(@aggregated_metrics_path, MonitorServletAll, self)
|
85
105
|
thread_create(:in_prometheus) do
|
86
106
|
@server.start
|
87
107
|
end
|
@@ -110,5 +130,35 @@ module Fluent::Plugin
|
|
110
130
|
res.body = $!.to_s
|
111
131
|
end
|
112
132
|
end
|
133
|
+
|
134
|
+
class MonitorServletAll < WEBrick::HTTPServlet::AbstractServlet
|
135
|
+
def initialize(server, prometheus)
|
136
|
+
@prometheus = prometheus
|
137
|
+
end
|
138
|
+
|
139
|
+
def do_GET(req, res)
|
140
|
+
res.status = 200
|
141
|
+
res['Content-Type'] = ::Prometheus::Client::Formats::Text::CONTENT_TYPE
|
142
|
+
|
143
|
+
full_result = PromMetricsAggregator.new
|
144
|
+
fluent_server_ip = @prometheus.bind == '0.0.0.0' ? '127.0.0.1' : @prometheus.bind
|
145
|
+
current_worker = 0
|
146
|
+
while current_worker < @prometheus.num_workers
|
147
|
+
Net::HTTP.start(fluent_server_ip, @prometheus.base_port + current_worker) do |http|
|
148
|
+
req = Net::HTTP::Get.new(@prometheus.metrics_path)
|
149
|
+
result = http.request(req)
|
150
|
+
if result.is_a?(Net::HTTPSuccess)
|
151
|
+
full_result.add_metrics(result.body)
|
152
|
+
end
|
153
|
+
end
|
154
|
+
current_worker += 1
|
155
|
+
end
|
156
|
+
res.body = full_result.get_metrics
|
157
|
+
rescue
|
158
|
+
res.status = 500
|
159
|
+
res['Content-Type'] = 'text/plain'
|
160
|
+
res.body = $!.to_s
|
161
|
+
end
|
162
|
+
end
|
113
163
|
end
|
114
164
|
end
|
@@ -5,6 +5,7 @@ require 'fluent/plugin/prometheus'
|
|
5
5
|
module Fluent::Plugin
|
6
6
|
class PrometheusMonitorInput < Fluent::Plugin::Input
|
7
7
|
Fluent::Plugin.register_input('prometheus_monitor', self)
|
8
|
+
include Fluent::Plugin::PrometheusLabelParser
|
8
9
|
|
9
10
|
helpers :timer
|
10
11
|
|
@@ -25,7 +26,7 @@ module Fluent::Plugin
|
|
25
26
|
hostname = Socket.gethostname
|
26
27
|
expander = Fluent::Plugin::Prometheus.placeholder_expander(log)
|
27
28
|
placeholders = expander.prepare_placeholders({'hostname' => hostname, 'worker_id' => fluentd_worker_id})
|
28
|
-
@base_labels =
|
29
|
+
@base_labels = parse_labels_elements(conf)
|
29
30
|
@base_labels.each do |key, value|
|
30
31
|
unless value.is_a?(String)
|
31
32
|
raise Fluent::ConfigError, "record accessor syntax is not available in prometheus_monitor"
|
@@ -45,6 +46,12 @@ module Fluent::Plugin
|
|
45
46
|
def start
|
46
47
|
super
|
47
48
|
|
49
|
+
@buffer_newest_timekey = @registry.gauge(
|
50
|
+
:fluentd_status_buffer_newest_timekey,
|
51
|
+
'Newest timekey in buffer.')
|
52
|
+
@buffer_oldest_timekey = @registry.gauge(
|
53
|
+
:fluentd_status_buffer_oldest_timekey,
|
54
|
+
'Oldest timekey in buffer.')
|
48
55
|
buffer_queue_length = @registry.gauge(
|
49
56
|
:fluentd_status_buffer_queue_length,
|
50
57
|
'Current buffer queue length.')
|
@@ -65,11 +72,19 @@ module Fluent::Plugin
|
|
65
72
|
|
66
73
|
def update_monitor_info
|
67
74
|
@monitor_agent.plugins_info_all.each do |info|
|
75
|
+
label = labels(info)
|
76
|
+
|
68
77
|
@monitor_info.each do |name, metric|
|
69
78
|
if info[name]
|
70
|
-
metric.set(
|
79
|
+
metric.set(label, info[name])
|
71
80
|
end
|
72
81
|
end
|
82
|
+
|
83
|
+
timekeys = info["buffer_timekeys"]
|
84
|
+
if timekeys && !timekeys.empty?
|
85
|
+
@buffer_newest_timekey.set(label, timekeys.max)
|
86
|
+
@buffer_oldest_timekey.set(label, timekeys.min)
|
87
|
+
end
|
73
88
|
end
|
74
89
|
end
|
75
90
|
|
@@ -5,6 +5,7 @@ require 'fluent/plugin/prometheus'
|
|
5
5
|
module Fluent::Plugin
|
6
6
|
class PrometheusOutputMonitorInput < Fluent::Input
|
7
7
|
Fluent::Plugin.register_input('prometheus_output_monitor', self)
|
8
|
+
include Fluent::Plugin::PrometheusLabelParser
|
8
9
|
|
9
10
|
helpers :timer
|
10
11
|
|
@@ -44,7 +45,7 @@ module Fluent::Plugin
|
|
44
45
|
hostname = Socket.gethostname
|
45
46
|
expander = Fluent::Plugin::Prometheus.placeholder_expander(log)
|
46
47
|
placeholders = expander.prepare_placeholders({'hostname' => hostname, 'worker_id' => fluentd_worker_id})
|
47
|
-
@base_labels =
|
48
|
+
@base_labels = parse_labels_elements(conf)
|
48
49
|
@base_labels.each do |key, value|
|
49
50
|
unless value.is_a?(String)
|
50
51
|
raise Fluent::ConfigError, "record accessor syntax is not available in prometheus_output_monitor"
|
@@ -64,12 +65,33 @@ module Fluent::Plugin
|
|
64
65
|
super
|
65
66
|
|
66
67
|
@metrics = {
|
67
|
-
|
68
|
-
:fluentd_output_status_buffer_queue_length,
|
69
|
-
'Current buffer queue length.'),
|
68
|
+
# Buffer metrics
|
70
69
|
buffer_total_queued_size: @registry.gauge(
|
71
70
|
:fluentd_output_status_buffer_total_bytes,
|
72
|
-
'Current total size of
|
71
|
+
'Current total size of stage and queue buffers.'),
|
72
|
+
buffer_stage_length: @registry.gauge(
|
73
|
+
:fluentd_output_status_buffer_stage_length,
|
74
|
+
'Current length of stage buffers.'),
|
75
|
+
buffer_stage_byte_size: @registry.gauge(
|
76
|
+
:fluentd_output_status_buffer_stage_byte_size,
|
77
|
+
'Current total size of stage buffers.'),
|
78
|
+
buffer_queue_length: @registry.gauge(
|
79
|
+
:fluentd_output_status_buffer_queue_length,
|
80
|
+
'Current length of queue buffers.'),
|
81
|
+
buffer_queue_byte_size: @registry.gauge(
|
82
|
+
:fluentd_output_status_queue_byte_size,
|
83
|
+
'Current total size of queue buffers.'),
|
84
|
+
buffer_available_buffer_space_ratios: @registry.gauge(
|
85
|
+
:fluentd_output_status_buffer_available_space_ratio,
|
86
|
+
'Ratio of available space in buffer.'),
|
87
|
+
buffer_newest_timekey: @registry.gauge(
|
88
|
+
:fluentd_output_status_buffer_newest_timekey,
|
89
|
+
'Newest timekey in buffer.'),
|
90
|
+
buffer_oldest_timekey: @registry.gauge(
|
91
|
+
:fluentd_output_status_buffer_oldest_timekey,
|
92
|
+
'Oldest timekey in buffer.'),
|
93
|
+
|
94
|
+
# Output metrics
|
73
95
|
retry_counts: @registry.gauge(
|
74
96
|
:fluentd_output_status_retry_count,
|
75
97
|
'Current retry counts.'),
|
@@ -112,8 +134,17 @@ module Fluent::Plugin
|
|
112
134
|
}
|
113
135
|
|
114
136
|
monitor_info = {
|
115
|
-
|
137
|
+
# buffer metrics
|
116
138
|
'buffer_total_queued_size' => @metrics[:buffer_total_queued_size],
|
139
|
+
'buffer_stage_length' => @metrics[:buffer_stage_length],
|
140
|
+
'buffer_stage_byte_size' => @metrics[:buffer_stage_byte_size],
|
141
|
+
'buffer_queue_length' => @metrics[:buffer_queue_length],
|
142
|
+
'buffer_queue_byte_size' => @metrics[:buffer_queue_byte_size],
|
143
|
+
'buffer_available_buffer_space_ratios' => @metrics[:buffer_available_buffer_space_ratios],
|
144
|
+
'buffer_newest_timekey' => @metrics[:buffer_newest_timekey],
|
145
|
+
'buffer_oldest_timekey' => @metrics[:buffer_oldest_timekey],
|
146
|
+
|
147
|
+
# output metrics
|
117
148
|
'retry_count' => @metrics[:retry_counts],
|
118
149
|
}
|
119
150
|
instance_vars_info = {
|
@@ -5,6 +5,7 @@ require 'fluent/plugin/prometheus'
|
|
5
5
|
module Fluent::Plugin
|
6
6
|
class PrometheusTailMonitorInput < Fluent::Plugin::Input
|
7
7
|
Fluent::Plugin.register_input('prometheus_tail_monitor', self)
|
8
|
+
include Fluent::Plugin::PrometheusLabelParser
|
8
9
|
|
9
10
|
helpers :timer
|
10
11
|
|
@@ -29,7 +30,7 @@ module Fluent::Plugin
|
|
29
30
|
hostname = Socket.gethostname
|
30
31
|
expander = Fluent::Plugin::Prometheus.placeholder_expander(log)
|
31
32
|
placeholders = expander.prepare_placeholders({'hostname' => hostname, 'worker_id' => fluentd_worker_id})
|
32
|
-
@base_labels =
|
33
|
+
@base_labels = parse_labels_elements(conf)
|
33
34
|
@base_labels.each do |key, value|
|
34
35
|
unless value.is_a?(String)
|
35
36
|
raise Fluent::ConfigError, "record accessor syntax is not available in prometheus_tail_monitor"
|
@@ -4,6 +4,7 @@ require 'fluent/plugin/prometheus'
|
|
4
4
|
module Fluent::Plugin
|
5
5
|
class PrometheusOutput < Fluent::Plugin::Output
|
6
6
|
Fluent::Plugin.register_output('prometheus', self)
|
7
|
+
include Fluent::Plugin::PrometheusLabelParser
|
7
8
|
include Fluent::Plugin::Prometheus
|
8
9
|
|
9
10
|
def initialize
|
@@ -17,7 +18,7 @@ module Fluent::Plugin
|
|
17
18
|
|
18
19
|
def configure(conf)
|
19
20
|
super
|
20
|
-
labels =
|
21
|
+
labels = parse_labels_elements(conf)
|
21
22
|
@metrics = Fluent::Plugin::Prometheus.parse_metrics_elements(conf, @registry, labels)
|
22
23
|
end
|
23
24
|
|
@@ -4,6 +4,31 @@ require 'fluent/plugin/filter_record_transformer'
|
|
4
4
|
|
5
5
|
module Fluent
|
6
6
|
module Plugin
|
7
|
+
module PrometheusLabelParser
|
8
|
+
def configure(conf)
|
9
|
+
super
|
10
|
+
# Check if running with multiple workers
|
11
|
+
sysconf = if self.respond_to?(:owner) && owner.respond_to?(:system_config)
|
12
|
+
owner.system_config
|
13
|
+
elsif self.respond_to?(:system_config)
|
14
|
+
self.system_config
|
15
|
+
else
|
16
|
+
nil
|
17
|
+
end
|
18
|
+
@multi_worker = sysconf && sysconf.workers ? (sysconf.workers > 1) : false
|
19
|
+
end
|
20
|
+
|
21
|
+
def parse_labels_elements(conf)
|
22
|
+
base_labels = Fluent::Plugin::Prometheus.parse_labels_elements(conf)
|
23
|
+
|
24
|
+
if @multi_worker
|
25
|
+
base_labels[:worker_id] = fluentd_worker_id.to_s
|
26
|
+
end
|
27
|
+
|
28
|
+
base_labels
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
7
32
|
module Prometheus
|
8
33
|
class AlreadyRegisteredError < StandardError; end
|
9
34
|
|
@@ -0,0 +1,77 @@
|
|
1
|
+
module Fluent::Plugin
|
2
|
+
|
3
|
+
##
|
4
|
+
# PromMetricsAggregator aggregates multiples metrics exposed using Prometheus text-based format
|
5
|
+
# see https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md
|
6
|
+
|
7
|
+
|
8
|
+
class PrometheusMetrics
|
9
|
+
def initialize
|
10
|
+
@comments = []
|
11
|
+
@metrics = []
|
12
|
+
end
|
13
|
+
|
14
|
+
def to_string
|
15
|
+
(@comments + @metrics).join("\n")
|
16
|
+
end
|
17
|
+
|
18
|
+
def add_comment(comment)
|
19
|
+
@comments << comment
|
20
|
+
end
|
21
|
+
|
22
|
+
def add_metric_value(value)
|
23
|
+
@metrics << value
|
24
|
+
end
|
25
|
+
|
26
|
+
attr_writer :comments, :metrics
|
27
|
+
end
|
28
|
+
|
29
|
+
class PromMetricsAggregator
|
30
|
+
def initialize
|
31
|
+
@metrics = {}
|
32
|
+
end
|
33
|
+
|
34
|
+
def get_metric_name_from_comment(line)
|
35
|
+
tokens = line.split(' ')
|
36
|
+
if ['HELP', 'TYPE'].include?(tokens[1])
|
37
|
+
tokens[2]
|
38
|
+
else
|
39
|
+
''
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
def add_metrics(metrics)
|
44
|
+
current_metric = ''
|
45
|
+
new_metric = false
|
46
|
+
lines = metrics.split("\n")
|
47
|
+
for line in lines
|
48
|
+
if line[0] == '#'
|
49
|
+
# Metric comment (# TYPE, # HELP)
|
50
|
+
parsed_metric = get_metric_name_from_comment(line)
|
51
|
+
if parsed_metric != ''
|
52
|
+
if parsed_metric != current_metric
|
53
|
+
# Starting a new metric comment block
|
54
|
+
new_metric = !@metrics.key?(parsed_metric)
|
55
|
+
if new_metric
|
56
|
+
@metrics[parsed_metric] = PrometheusMetrics.new()
|
57
|
+
end
|
58
|
+
current_metric = parsed_metric
|
59
|
+
end
|
60
|
+
|
61
|
+
if new_metric && parsed_metric == current_metric
|
62
|
+
# New metric, inject comments (# TYPE, # HELP)
|
63
|
+
@metrics[parsed_metric].add_comment(line)
|
64
|
+
end
|
65
|
+
end
|
66
|
+
else
|
67
|
+
# Metric value, simply append line
|
68
|
+
@metrics[current_metric].add_metric_value(line)
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
72
|
+
|
73
|
+
def get_metrics
|
74
|
+
@metrics.map{|k,v| v.to_string()}.join("\n") + (@metrics.length ? "\n" : "")
|
75
|
+
end
|
76
|
+
end
|
77
|
+
end
|
data/misc/prometheus_alerts.yaml
CHANGED
@@ -47,7 +47,7 @@ ALERT FluentdQueueLength
|
|
47
47
|
}
|
48
48
|
|
49
49
|
ALERT FluentdRecordsCountsHigh
|
50
|
-
IF sum(rate(
|
50
|
+
IF sum(rate(fluentd_output_status_emit_records{job="fluentd"}[5m])) BY (instance) > (3 * sum(rate(fluentd_output_status_emit_records{job="fluentd"}[15m])) BY (instance))
|
51
51
|
FOR 1m
|
52
52
|
LABELS {
|
53
53
|
service = "fluentd",
|
@@ -0,0 +1,138 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
require 'fluent/plugin/in_prometheus'
|
3
|
+
require 'fluent/test/driver/input'
|
4
|
+
|
5
|
+
require 'net/http'
|
6
|
+
|
7
|
+
describe Fluent::Plugin::PromMetricsAggregator do
|
8
|
+
|
9
|
+
metrics_worker_1 = %[# TYPE fluentd_status_buffer_queue_length gauge
|
10
|
+
# HELP fluentd_status_buffer_queue_length Current buffer queue length.
|
11
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="0",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
12
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="0",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
13
|
+
# TYPE fluentd_status_buffer_total_bytes gauge
|
14
|
+
# HELP fluentd_status_buffer_total_bytes Current total size of queued buffers.
|
15
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="0",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
16
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="0",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
17
|
+
# TYPE log_counter counter
|
18
|
+
# HELP log_counter the number of received logs
|
19
|
+
log_counter{worker_id="0",host="0123456789ab",tag="fluent.info"} 1.0
|
20
|
+
# HELP empty_metric A metric with no data
|
21
|
+
# TYPE empty_metric gauge
|
22
|
+
# HELP http_request_duration_seconds The HTTP request latencies in seconds.
|
23
|
+
# TYPE http_request_duration_seconds histogram
|
24
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.005"} 58
|
25
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.01"} 58
|
26
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.05"} 59
|
27
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.1"} 59
|
28
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="1"} 59
|
29
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="10"} 59
|
30
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="+Inf"} 59
|
31
|
+
http_request_duration_seconds_sum{code="200",worker_id="0",method="GET"} 0.05046115500000003
|
32
|
+
http_request_duration_seconds_count{code="200",worker_id="0",method="GET"} 59
|
33
|
+
]
|
34
|
+
|
35
|
+
metrics_worker_2 = %[# TYPE fluentd_output_status_buffer_queue_length gauge
|
36
|
+
# HELP fluentd_output_status_buffer_queue_length Current buffer queue length.
|
37
|
+
fluentd_output_status_buffer_queue_length{host="0123456789ab",worker_id="0",plugin_id="plugin-1",type="s3"} 0.0
|
38
|
+
fluentd_output_status_buffer_queue_length{host="0123456789ab",worker_id="0",plugin_id="plugin-2",type="s3"} 0.0
|
39
|
+
# TYPE fluentd_output_status_buffer_total_bytes gauge
|
40
|
+
# HELP fluentd_output_status_buffer_total_bytes Current total size of queued buffers.
|
41
|
+
fluentd_output_status_buffer_total_bytes{host="0123456789ab",worker_id="0",plugin_id="plugin-1",type="s3"} 0.0
|
42
|
+
fluentd_output_status_buffer_total_bytes{host="0123456789ab",worker_id="0",plugin_id="plugin-2",type="s3"} 0.0
|
43
|
+
]
|
44
|
+
|
45
|
+
metrics_worker_3 = %[# TYPE fluentd_status_buffer_queue_length gauge
|
46
|
+
# HELP fluentd_status_buffer_queue_length Current buffer queue length.
|
47
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="1",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
48
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="1",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
49
|
+
# TYPE fluentd_status_buffer_total_bytes gauge
|
50
|
+
# HELP fluentd_status_buffer_total_bytes Current total size of queued buffers.
|
51
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="1",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
52
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="1",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
53
|
+
# HELP http_request_duration_seconds The HTTP request latencies in seconds.
|
54
|
+
# TYPE http_request_duration_seconds histogram
|
55
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.005"} 70
|
56
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.01"} 70
|
57
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.05"} 71
|
58
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.1"} 71
|
59
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="1"} 71
|
60
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="10"} 71
|
61
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="+Inf"} 71
|
62
|
+
http_request_duration_seconds_sum{code="200",worker_id="1",method="GET"} 0.05646315600000003
|
63
|
+
http_request_duration_seconds_count{code="200",worker_id="1",method="GET"} 71
|
64
|
+
]
|
65
|
+
|
66
|
+
metrics_merged_1_and_3 = %[# TYPE fluentd_status_buffer_queue_length gauge
|
67
|
+
# HELP fluentd_status_buffer_queue_length Current buffer queue length.
|
68
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="0",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
69
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="0",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
70
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="1",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
71
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="1",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
72
|
+
# TYPE fluentd_status_buffer_total_bytes gauge
|
73
|
+
# HELP fluentd_status_buffer_total_bytes Current total size of queued buffers.
|
74
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="0",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
75
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="0",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
76
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="1",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
77
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="1",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
78
|
+
# TYPE log_counter counter
|
79
|
+
# HELP log_counter the number of received logs
|
80
|
+
log_counter{worker_id="0",host="0123456789ab",tag="fluent.info"} 1.0
|
81
|
+
# HELP empty_metric A metric with no data
|
82
|
+
# TYPE empty_metric gauge
|
83
|
+
# HELP http_request_duration_seconds The HTTP request latencies in seconds.
|
84
|
+
# TYPE http_request_duration_seconds histogram
|
85
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.005"} 58
|
86
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.01"} 58
|
87
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.05"} 59
|
88
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.1"} 59
|
89
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="1"} 59
|
90
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="10"} 59
|
91
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="+Inf"} 59
|
92
|
+
http_request_duration_seconds_sum{code="200",worker_id="0",method="GET"} 0.05046115500000003
|
93
|
+
http_request_duration_seconds_count{code="200",worker_id="0",method="GET"} 59
|
94
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.005"} 70
|
95
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.01"} 70
|
96
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.05"} 71
|
97
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.1"} 71
|
98
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="1"} 71
|
99
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="10"} 71
|
100
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="+Inf"} 71
|
101
|
+
http_request_duration_seconds_sum{code="200",worker_id="1",method="GET"} 0.05646315600000003
|
102
|
+
http_request_duration_seconds_count{code="200",worker_id="1",method="GET"} 71
|
103
|
+
]
|
104
|
+
|
105
|
+
describe 'add_metrics' do
|
106
|
+
context '1st_metrics' do
|
107
|
+
it 'adds all fields' do
|
108
|
+
all_metrics = Fluent::Plugin::PromMetricsAggregator.new
|
109
|
+
all_metrics.add_metrics(metrics_worker_1)
|
110
|
+
result_str = all_metrics.get_metrics
|
111
|
+
|
112
|
+
expect(result_str).to eq(metrics_worker_1)
|
113
|
+
end
|
114
|
+
end
|
115
|
+
context '2nd_metrics' do
|
116
|
+
it 'append new metrics' do
|
117
|
+
all_metrics = Fluent::Plugin::PromMetricsAggregator.new
|
118
|
+
all_metrics.add_metrics(metrics_worker_1)
|
119
|
+
all_metrics.add_metrics(metrics_worker_2)
|
120
|
+
result_str = all_metrics.get_metrics
|
121
|
+
|
122
|
+
expect(result_str).to eq(metrics_worker_1 + metrics_worker_2)
|
123
|
+
end
|
124
|
+
end
|
125
|
+
|
126
|
+
context '3rd_metrics' do
|
127
|
+
it 'append existing metrics in the right place' do
|
128
|
+
all_metrics = Fluent::Plugin::PromMetricsAggregator.new
|
129
|
+
all_metrics.add_metrics(metrics_worker_1)
|
130
|
+
all_metrics.add_metrics(metrics_worker_2)
|
131
|
+
all_metrics.add_metrics(metrics_worker_3)
|
132
|
+
result_str = all_metrics.get_metrics
|
133
|
+
|
134
|
+
expect(result_str).to eq(metrics_merged_1_and_3 + metrics_worker_2)
|
135
|
+
end
|
136
|
+
end
|
137
|
+
end
|
138
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: fluent-plugin-prometheus
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.
|
4
|
+
version: 1.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Masahiro Sano
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2019-
|
11
|
+
date: 2019-09-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: fluentd
|
@@ -122,6 +122,7 @@ files:
|
|
122
122
|
- lib/fluent/plugin/in_prometheus_tail_monitor.rb
|
123
123
|
- lib/fluent/plugin/out_prometheus.rb
|
124
124
|
- lib/fluent/plugin/prometheus.rb
|
125
|
+
- lib/fluent/plugin/prometheus_metrics.rb
|
125
126
|
- misc/fluentd_sample.conf
|
126
127
|
- misc/nginx_proxy.conf
|
127
128
|
- misc/prometheus.yaml
|
@@ -129,6 +130,7 @@ files:
|
|
129
130
|
- spec/fluent/plugin/filter_prometheus_spec.rb
|
130
131
|
- spec/fluent/plugin/in_prometheus_monitor_spec.rb
|
131
132
|
- spec/fluent/plugin/out_prometheus_spec.rb
|
133
|
+
- spec/fluent/plugin/prometheus_metrics_spec.rb
|
132
134
|
- spec/fluent/plugin/prometheus_spec.rb
|
133
135
|
- spec/fluent/plugin/shared.rb
|
134
136
|
- spec/spec_helper.rb
|
@@ -151,7 +153,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
151
153
|
- !ruby/object:Gem::Version
|
152
154
|
version: '0'
|
153
155
|
requirements: []
|
154
|
-
|
156
|
+
rubyforge_project:
|
157
|
+
rubygems_version: 2.7.6
|
155
158
|
signing_key:
|
156
159
|
specification_version: 4
|
157
160
|
summary: A fluent plugin that collects metrics and exposes for Prometheus.
|
@@ -159,6 +162,7 @@ test_files:
|
|
159
162
|
- spec/fluent/plugin/filter_prometheus_spec.rb
|
160
163
|
- spec/fluent/plugin/in_prometheus_monitor_spec.rb
|
161
164
|
- spec/fluent/plugin/out_prometheus_spec.rb
|
165
|
+
- spec/fluent/plugin/prometheus_metrics_spec.rb
|
162
166
|
- spec/fluent/plugin/prometheus_spec.rb
|
163
167
|
- spec/fluent/plugin/shared.rb
|
164
168
|
- spec/spec_helper.rb
|