fluent-plugin-prometheus 1.5.0 → 1.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.travis.yml +3 -3
- data/README.md +31 -17
- data/fluent-plugin-prometheus.gemspec +1 -1
- data/lib/fluent/plugin/filter_prometheus.rb +2 -1
- data/lib/fluent/plugin/in_prometheus.rb +50 -0
- data/lib/fluent/plugin/in_prometheus_monitor.rb +17 -2
- data/lib/fluent/plugin/in_prometheus_output_monitor.rb +37 -6
- data/lib/fluent/plugin/in_prometheus_tail_monitor.rb +2 -1
- data/lib/fluent/plugin/out_prometheus.rb +2 -1
- data/lib/fluent/plugin/prometheus.rb +25 -0
- data/lib/fluent/plugin/prometheus_metrics.rb +77 -0
- data/misc/prometheus_alerts.yaml +1 -1
- data/spec/fluent/plugin/prometheus_metrics_spec.rb +138 -0
- metadata +7 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3f079822fa7231b77912e5ac1324b691f38dbf1f1f32fd8700dee19bc636fd8c
|
4
|
+
data.tar.gz: 58ad623a333cc1600e971fdd35b9b2c7d4a381c03a954d99206233758e4562f6
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 4394983e398e9e4b116f54927b7ad82778ab672b27e41bfc1ef3eb5421f3f3cde7899ce906d86de8862179c0293c1831c0a555553f21c10787dddba22c68fe1e
|
7
|
+
data.tar.gz: f8d52d61a493f4c8abfc88a1cfdc415779a4946eeed639b03545574b2a86304d7a560c026391897ab93596280be306a02c24d3dcdbc161c30f625c211d237711
|
data/.travis.yml
CHANGED
data/README.md
CHANGED
@@ -58,18 +58,25 @@ More configuration parameters:
|
|
58
58
|
- `bind`: binding interface (default: '0.0.0.0')
|
59
59
|
- `port`: listen port (default: 24231)
|
60
60
|
- `metrics_path`: metrics HTTP endpoint (default: /metrics)
|
61
|
+
- `aggregated_metrics_path`: metrics HTTP endpoint (default: /aggregated_metrics)
|
61
62
|
|
62
63
|
When using multiple workers, each worker binds to port + `fluent_worker_id`.
|
64
|
+
To scrape metrics from all workers at once, you can access http://localhost:24231/aggregated_metrics.
|
63
65
|
|
64
66
|
### prometheus_monitor input plugin
|
65
67
|
|
66
68
|
This plugin collects internal metrics in Fluentd. The metrics are similar to/part of [monitor_agent](https://docs.fluentd.org/input/monitor_agent).
|
67
69
|
|
68
|
-
Current exposed metrics:
|
69
70
|
|
70
|
-
|
71
|
-
|
72
|
-
- `
|
71
|
+
#### Exposed metrics
|
72
|
+
|
73
|
+
- `fluentd_status_buffer_queue_length`
|
74
|
+
- `fluentd_status_buffer_total_queued_size`
|
75
|
+
- `fluentd_status_retry_count`
|
76
|
+
- `fluentd_status_buffer_newest_timekey` from fluentd v1.4.2
|
77
|
+
- `fluentd_status_buffer_oldest_timekey` from fluentd v1.4.2
|
78
|
+
|
79
|
+
#### Configuration
|
73
80
|
|
74
81
|
With following configuration, those metrics are collected.
|
75
82
|
|
@@ -86,28 +93,35 @@ More configuration parameters:
|
|
86
93
|
|
87
94
|
### prometheus_output_monitor input plugin
|
88
95
|
|
89
|
-
**experimental**
|
90
|
-
|
91
96
|
This plugin collects internal metrics for output plugin in Fluentd. This is similar to `prometheus_monitor` plugin, but specialized for output plugin. There are Many metrics `prometheus_monitor` does not include, such as `num_errors`, `retry_wait` and so on.
|
92
97
|
|
93
|
-
|
98
|
+
#### Exposed metrics
|
99
|
+
|
100
|
+
Metrics for output
|
94
101
|
|
95
|
-
- `fluentd_output_status_buffer_queue_length`
|
96
|
-
- `fluentd_output_status_buffer_total_bytes`
|
97
102
|
- `fluentd_output_status_retry_count`
|
98
103
|
- `fluentd_output_status_num_errors`
|
99
104
|
- `fluentd_output_status_emit_count`
|
100
|
-
- `fluentd_output_status_flush_time_count`
|
101
|
-
- `fluentd_output_status_slow_flush_count`
|
102
105
|
- `fluentd_output_status_retry_wait`
|
103
106
|
- current retry_wait computed from last retry time and next retry time
|
104
107
|
- `fluentd_output_status_emit_records`
|
105
|
-
- only for v0.14
|
106
108
|
- `fluentd_output_status_write_count`
|
107
|
-
- only for v0.14
|
108
109
|
- `fluentd_output_status_rollback_count`
|
109
|
-
|
110
|
+
- `fluentd_output_status_flush_time_count` from fluentd v1.6.0
|
111
|
+
- `fluentd_output_status_slow_flush_count` from fluentd v1.6.0
|
110
112
|
|
113
|
+
Metrics for buffer
|
114
|
+
|
115
|
+
- `fluentd_output_status_buffer_total_bytes`
|
116
|
+
- `fluentd_output_status_buffer_stage_length` from fluentd v1.6.0
|
117
|
+
- `fluentd_output_status_buffer_stage_byte_size` from fluentd v1.6.0
|
118
|
+
- `fluentd_output_status_buffer_queue_length`
|
119
|
+
- `fluentd_output_status_buffer_queue_byte_size` from fluentd v1.6.0
|
120
|
+
- `fluentd_output_status_buffer_newest_timekey` from fluentd v1.6.0
|
121
|
+
- `fluentd_output_status_buffer_oldest_timekey` from fluentd v1.6.0
|
122
|
+
- `fluentd_output_status_buffer_available_space_ratio` from fluentd v1.6.0
|
123
|
+
|
124
|
+
#### Configuration
|
111
125
|
|
112
126
|
With following configuration, those metrics are collected.
|
113
127
|
|
@@ -124,13 +138,11 @@ More configuration parameters:
|
|
124
138
|
|
125
139
|
### prometheus_tail_monitor input plugin
|
126
140
|
|
127
|
-
**experimental**
|
128
|
-
|
129
141
|
This plugin collects internal metrics for in_tail plugin in Fluentd. in_tail plugin holds internal state for files that the plugin is watching. The state is sometimes important to monitor plugins work correctly.
|
130
142
|
|
131
143
|
This plugin uses internal class of Fluentd, so it's easy to break.
|
132
144
|
|
133
|
-
|
145
|
+
#### Exposed metrics
|
134
146
|
|
135
147
|
- `fluentd_tail_file_position`
|
136
148
|
- Current bytes which plugin reads from the file
|
@@ -143,6 +155,8 @@ Default labels:
|
|
143
155
|
- `type`: plugin name. `in_tail` only for now.
|
144
156
|
- `path`: file path
|
145
157
|
|
158
|
+
#### Configuration
|
159
|
+
|
146
160
|
With following configuration, those metrics are collected.
|
147
161
|
|
148
162
|
```
|
@@ -1,6 +1,6 @@
|
|
1
1
|
Gem::Specification.new do |spec|
|
2
2
|
spec.name = "fluent-plugin-prometheus"
|
3
|
-
spec.version = "1.
|
3
|
+
spec.version = "1.6.0"
|
4
4
|
spec.authors = ["Masahiro Sano"]
|
5
5
|
spec.email = ["sabottenda@gmail.com"]
|
6
6
|
spec.summary = %q{A fluent plugin that collects metrics and exposes for Prometheus.}
|
@@ -4,6 +4,7 @@ require 'fluent/plugin/filter'
|
|
4
4
|
module Fluent::Plugin
|
5
5
|
class PrometheusFilter < Fluent::Plugin::Filter
|
6
6
|
Fluent::Plugin.register_filter('prometheus', self)
|
7
|
+
include Fluent::Plugin::PrometheusLabelParser
|
7
8
|
include Fluent::Plugin::Prometheus
|
8
9
|
|
9
10
|
def initialize
|
@@ -17,7 +18,7 @@ module Fluent::Plugin
|
|
17
18
|
|
18
19
|
def configure(conf)
|
19
20
|
super
|
20
|
-
labels =
|
21
|
+
labels = parse_labels_elements(conf)
|
21
22
|
@metrics = Fluent::Plugin::Prometheus.parse_metrics_elements(conf, @registry, labels)
|
22
23
|
end
|
23
24
|
|
@@ -1,5 +1,7 @@
|
|
1
1
|
require 'fluent/plugin/input'
|
2
2
|
require 'fluent/plugin/prometheus'
|
3
|
+
require 'fluent/plugin/prometheus_metrics'
|
4
|
+
require 'net/http'
|
3
5
|
require 'webrick'
|
4
6
|
|
5
7
|
module Fluent::Plugin
|
@@ -11,6 +13,7 @@ module Fluent::Plugin
|
|
11
13
|
config_param :bind, :string, default: '0.0.0.0'
|
12
14
|
config_param :port, :integer, default: 24231
|
13
15
|
config_param :metrics_path, :string, default: '/metrics'
|
16
|
+
config_param :aggregated_metrics_path, :string, default: '/aggregated_metrics'
|
14
17
|
|
15
18
|
desc 'Enable ssl configuration for the server'
|
16
19
|
config_section :ssl, required: false, multi: false do
|
@@ -31,6 +34,10 @@ module Fluent::Plugin
|
|
31
34
|
|
32
35
|
attr_reader :registry
|
33
36
|
|
37
|
+
attr_reader :num_workers
|
38
|
+
attr_reader :base_port
|
39
|
+
attr_reader :metrics_path
|
40
|
+
|
34
41
|
def initialize
|
35
42
|
super
|
36
43
|
@registry = ::Prometheus::Client.registry
|
@@ -38,6 +45,18 @@ module Fluent::Plugin
|
|
38
45
|
|
39
46
|
def configure(conf)
|
40
47
|
super
|
48
|
+
|
49
|
+
# Get how many workers we have
|
50
|
+
sysconf = if self.respond_to?(:owner) && owner.respond_to?(:system_config)
|
51
|
+
owner.system_config
|
52
|
+
elsif self.respond_to?(:system_config)
|
53
|
+
self.system_config
|
54
|
+
else
|
55
|
+
nil
|
56
|
+
end
|
57
|
+
@num_workers = sysconf && sysconf.workers ? sysconf.workers : 1
|
58
|
+
|
59
|
+
@base_port = @port
|
41
60
|
@port += fluentd_worker_id
|
42
61
|
end
|
43
62
|
|
@@ -82,6 +101,7 @@ module Fluent::Plugin
|
|
82
101
|
|
83
102
|
@server = WEBrick::HTTPServer.new(config)
|
84
103
|
@server.mount(@metrics_path, MonitorServlet, self)
|
104
|
+
@server.mount(@aggregated_metrics_path, MonitorServletAll, self)
|
85
105
|
thread_create(:in_prometheus) do
|
86
106
|
@server.start
|
87
107
|
end
|
@@ -110,5 +130,35 @@ module Fluent::Plugin
|
|
110
130
|
res.body = $!.to_s
|
111
131
|
end
|
112
132
|
end
|
133
|
+
|
134
|
+
class MonitorServletAll < WEBrick::HTTPServlet::AbstractServlet
|
135
|
+
def initialize(server, prometheus)
|
136
|
+
@prometheus = prometheus
|
137
|
+
end
|
138
|
+
|
139
|
+
def do_GET(req, res)
|
140
|
+
res.status = 200
|
141
|
+
res['Content-Type'] = ::Prometheus::Client::Formats::Text::CONTENT_TYPE
|
142
|
+
|
143
|
+
full_result = PromMetricsAggregator.new
|
144
|
+
fluent_server_ip = @prometheus.bind == '0.0.0.0' ? '127.0.0.1' : @prometheus.bind
|
145
|
+
current_worker = 0
|
146
|
+
while current_worker < @prometheus.num_workers
|
147
|
+
Net::HTTP.start(fluent_server_ip, @prometheus.base_port + current_worker) do |http|
|
148
|
+
req = Net::HTTP::Get.new(@prometheus.metrics_path)
|
149
|
+
result = http.request(req)
|
150
|
+
if result.is_a?(Net::HTTPSuccess)
|
151
|
+
full_result.add_metrics(result.body)
|
152
|
+
end
|
153
|
+
end
|
154
|
+
current_worker += 1
|
155
|
+
end
|
156
|
+
res.body = full_result.get_metrics
|
157
|
+
rescue
|
158
|
+
res.status = 500
|
159
|
+
res['Content-Type'] = 'text/plain'
|
160
|
+
res.body = $!.to_s
|
161
|
+
end
|
162
|
+
end
|
113
163
|
end
|
114
164
|
end
|
@@ -5,6 +5,7 @@ require 'fluent/plugin/prometheus'
|
|
5
5
|
module Fluent::Plugin
|
6
6
|
class PrometheusMonitorInput < Fluent::Plugin::Input
|
7
7
|
Fluent::Plugin.register_input('prometheus_monitor', self)
|
8
|
+
include Fluent::Plugin::PrometheusLabelParser
|
8
9
|
|
9
10
|
helpers :timer
|
10
11
|
|
@@ -25,7 +26,7 @@ module Fluent::Plugin
|
|
25
26
|
hostname = Socket.gethostname
|
26
27
|
expander = Fluent::Plugin::Prometheus.placeholder_expander(log)
|
27
28
|
placeholders = expander.prepare_placeholders({'hostname' => hostname, 'worker_id' => fluentd_worker_id})
|
28
|
-
@base_labels =
|
29
|
+
@base_labels = parse_labels_elements(conf)
|
29
30
|
@base_labels.each do |key, value|
|
30
31
|
unless value.is_a?(String)
|
31
32
|
raise Fluent::ConfigError, "record accessor syntax is not available in prometheus_monitor"
|
@@ -45,6 +46,12 @@ module Fluent::Plugin
|
|
45
46
|
def start
|
46
47
|
super
|
47
48
|
|
49
|
+
@buffer_newest_timekey = @registry.gauge(
|
50
|
+
:fluentd_status_buffer_newest_timekey,
|
51
|
+
'Newest timekey in buffer.')
|
52
|
+
@buffer_oldest_timekey = @registry.gauge(
|
53
|
+
:fluentd_status_buffer_oldest_timekey,
|
54
|
+
'Oldest timekey in buffer.')
|
48
55
|
buffer_queue_length = @registry.gauge(
|
49
56
|
:fluentd_status_buffer_queue_length,
|
50
57
|
'Current buffer queue length.')
|
@@ -65,11 +72,19 @@ module Fluent::Plugin
|
|
65
72
|
|
66
73
|
def update_monitor_info
|
67
74
|
@monitor_agent.plugins_info_all.each do |info|
|
75
|
+
label = labels(info)
|
76
|
+
|
68
77
|
@monitor_info.each do |name, metric|
|
69
78
|
if info[name]
|
70
|
-
metric.set(
|
79
|
+
metric.set(label, info[name])
|
71
80
|
end
|
72
81
|
end
|
82
|
+
|
83
|
+
timekeys = info["buffer_timekeys"]
|
84
|
+
if timekeys && !timekeys.empty?
|
85
|
+
@buffer_newest_timekey.set(label, timekeys.max)
|
86
|
+
@buffer_oldest_timekey.set(label, timekeys.min)
|
87
|
+
end
|
73
88
|
end
|
74
89
|
end
|
75
90
|
|
@@ -5,6 +5,7 @@ require 'fluent/plugin/prometheus'
|
|
5
5
|
module Fluent::Plugin
|
6
6
|
class PrometheusOutputMonitorInput < Fluent::Input
|
7
7
|
Fluent::Plugin.register_input('prometheus_output_monitor', self)
|
8
|
+
include Fluent::Plugin::PrometheusLabelParser
|
8
9
|
|
9
10
|
helpers :timer
|
10
11
|
|
@@ -44,7 +45,7 @@ module Fluent::Plugin
|
|
44
45
|
hostname = Socket.gethostname
|
45
46
|
expander = Fluent::Plugin::Prometheus.placeholder_expander(log)
|
46
47
|
placeholders = expander.prepare_placeholders({'hostname' => hostname, 'worker_id' => fluentd_worker_id})
|
47
|
-
@base_labels =
|
48
|
+
@base_labels = parse_labels_elements(conf)
|
48
49
|
@base_labels.each do |key, value|
|
49
50
|
unless value.is_a?(String)
|
50
51
|
raise Fluent::ConfigError, "record accessor syntax is not available in prometheus_output_monitor"
|
@@ -64,12 +65,33 @@ module Fluent::Plugin
|
|
64
65
|
super
|
65
66
|
|
66
67
|
@metrics = {
|
67
|
-
|
68
|
-
:fluentd_output_status_buffer_queue_length,
|
69
|
-
'Current buffer queue length.'),
|
68
|
+
# Buffer metrics
|
70
69
|
buffer_total_queued_size: @registry.gauge(
|
71
70
|
:fluentd_output_status_buffer_total_bytes,
|
72
|
-
'Current total size of
|
71
|
+
'Current total size of stage and queue buffers.'),
|
72
|
+
buffer_stage_length: @registry.gauge(
|
73
|
+
:fluentd_output_status_buffer_stage_length,
|
74
|
+
'Current length of stage buffers.'),
|
75
|
+
buffer_stage_byte_size: @registry.gauge(
|
76
|
+
:fluentd_output_status_buffer_stage_byte_size,
|
77
|
+
'Current total size of stage buffers.'),
|
78
|
+
buffer_queue_length: @registry.gauge(
|
79
|
+
:fluentd_output_status_buffer_queue_length,
|
80
|
+
'Current length of queue buffers.'),
|
81
|
+
buffer_queue_byte_size: @registry.gauge(
|
82
|
+
:fluentd_output_status_queue_byte_size,
|
83
|
+
'Current total size of queue buffers.'),
|
84
|
+
buffer_available_buffer_space_ratios: @registry.gauge(
|
85
|
+
:fluentd_output_status_buffer_available_space_ratio,
|
86
|
+
'Ratio of available space in buffer.'),
|
87
|
+
buffer_newest_timekey: @registry.gauge(
|
88
|
+
:fluentd_output_status_buffer_newest_timekey,
|
89
|
+
'Newest timekey in buffer.'),
|
90
|
+
buffer_oldest_timekey: @registry.gauge(
|
91
|
+
:fluentd_output_status_buffer_oldest_timekey,
|
92
|
+
'Oldest timekey in buffer.'),
|
93
|
+
|
94
|
+
# Output metrics
|
73
95
|
retry_counts: @registry.gauge(
|
74
96
|
:fluentd_output_status_retry_count,
|
75
97
|
'Current retry counts.'),
|
@@ -112,8 +134,17 @@ module Fluent::Plugin
|
|
112
134
|
}
|
113
135
|
|
114
136
|
monitor_info = {
|
115
|
-
|
137
|
+
# buffer metrics
|
116
138
|
'buffer_total_queued_size' => @metrics[:buffer_total_queued_size],
|
139
|
+
'buffer_stage_length' => @metrics[:buffer_stage_length],
|
140
|
+
'buffer_stage_byte_size' => @metrics[:buffer_stage_byte_size],
|
141
|
+
'buffer_queue_length' => @metrics[:buffer_queue_length],
|
142
|
+
'buffer_queue_byte_size' => @metrics[:buffer_queue_byte_size],
|
143
|
+
'buffer_available_buffer_space_ratios' => @metrics[:buffer_available_buffer_space_ratios],
|
144
|
+
'buffer_newest_timekey' => @metrics[:buffer_newest_timekey],
|
145
|
+
'buffer_oldest_timekey' => @metrics[:buffer_oldest_timekey],
|
146
|
+
|
147
|
+
# output metrics
|
117
148
|
'retry_count' => @metrics[:retry_counts],
|
118
149
|
}
|
119
150
|
instance_vars_info = {
|
@@ -5,6 +5,7 @@ require 'fluent/plugin/prometheus'
|
|
5
5
|
module Fluent::Plugin
|
6
6
|
class PrometheusTailMonitorInput < Fluent::Plugin::Input
|
7
7
|
Fluent::Plugin.register_input('prometheus_tail_monitor', self)
|
8
|
+
include Fluent::Plugin::PrometheusLabelParser
|
8
9
|
|
9
10
|
helpers :timer
|
10
11
|
|
@@ -29,7 +30,7 @@ module Fluent::Plugin
|
|
29
30
|
hostname = Socket.gethostname
|
30
31
|
expander = Fluent::Plugin::Prometheus.placeholder_expander(log)
|
31
32
|
placeholders = expander.prepare_placeholders({'hostname' => hostname, 'worker_id' => fluentd_worker_id})
|
32
|
-
@base_labels =
|
33
|
+
@base_labels = parse_labels_elements(conf)
|
33
34
|
@base_labels.each do |key, value|
|
34
35
|
unless value.is_a?(String)
|
35
36
|
raise Fluent::ConfigError, "record accessor syntax is not available in prometheus_tail_monitor"
|
@@ -4,6 +4,7 @@ require 'fluent/plugin/prometheus'
|
|
4
4
|
module Fluent::Plugin
|
5
5
|
class PrometheusOutput < Fluent::Plugin::Output
|
6
6
|
Fluent::Plugin.register_output('prometheus', self)
|
7
|
+
include Fluent::Plugin::PrometheusLabelParser
|
7
8
|
include Fluent::Plugin::Prometheus
|
8
9
|
|
9
10
|
def initialize
|
@@ -17,7 +18,7 @@ module Fluent::Plugin
|
|
17
18
|
|
18
19
|
def configure(conf)
|
19
20
|
super
|
20
|
-
labels =
|
21
|
+
labels = parse_labels_elements(conf)
|
21
22
|
@metrics = Fluent::Plugin::Prometheus.parse_metrics_elements(conf, @registry, labels)
|
22
23
|
end
|
23
24
|
|
@@ -4,6 +4,31 @@ require 'fluent/plugin/filter_record_transformer'
|
|
4
4
|
|
5
5
|
module Fluent
|
6
6
|
module Plugin
|
7
|
+
module PrometheusLabelParser
|
8
|
+
def configure(conf)
|
9
|
+
super
|
10
|
+
# Check if running with multiple workers
|
11
|
+
sysconf = if self.respond_to?(:owner) && owner.respond_to?(:system_config)
|
12
|
+
owner.system_config
|
13
|
+
elsif self.respond_to?(:system_config)
|
14
|
+
self.system_config
|
15
|
+
else
|
16
|
+
nil
|
17
|
+
end
|
18
|
+
@multi_worker = sysconf && sysconf.workers ? (sysconf.workers > 1) : false
|
19
|
+
end
|
20
|
+
|
21
|
+
def parse_labels_elements(conf)
|
22
|
+
base_labels = Fluent::Plugin::Prometheus.parse_labels_elements(conf)
|
23
|
+
|
24
|
+
if @multi_worker
|
25
|
+
base_labels[:worker_id] = fluentd_worker_id.to_s
|
26
|
+
end
|
27
|
+
|
28
|
+
base_labels
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
7
32
|
module Prometheus
|
8
33
|
class AlreadyRegisteredError < StandardError; end
|
9
34
|
|
@@ -0,0 +1,77 @@
|
|
1
|
+
module Fluent::Plugin
|
2
|
+
|
3
|
+
##
|
4
|
+
# PromMetricsAggregator aggregates multiples metrics exposed using Prometheus text-based format
|
5
|
+
# see https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md
|
6
|
+
|
7
|
+
|
8
|
+
class PrometheusMetrics
|
9
|
+
def initialize
|
10
|
+
@comments = []
|
11
|
+
@metrics = []
|
12
|
+
end
|
13
|
+
|
14
|
+
def to_string
|
15
|
+
(@comments + @metrics).join("\n")
|
16
|
+
end
|
17
|
+
|
18
|
+
def add_comment(comment)
|
19
|
+
@comments << comment
|
20
|
+
end
|
21
|
+
|
22
|
+
def add_metric_value(value)
|
23
|
+
@metrics << value
|
24
|
+
end
|
25
|
+
|
26
|
+
attr_writer :comments, :metrics
|
27
|
+
end
|
28
|
+
|
29
|
+
class PromMetricsAggregator
|
30
|
+
def initialize
|
31
|
+
@metrics = {}
|
32
|
+
end
|
33
|
+
|
34
|
+
def get_metric_name_from_comment(line)
|
35
|
+
tokens = line.split(' ')
|
36
|
+
if ['HELP', 'TYPE'].include?(tokens[1])
|
37
|
+
tokens[2]
|
38
|
+
else
|
39
|
+
''
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
def add_metrics(metrics)
|
44
|
+
current_metric = ''
|
45
|
+
new_metric = false
|
46
|
+
lines = metrics.split("\n")
|
47
|
+
for line in lines
|
48
|
+
if line[0] == '#'
|
49
|
+
# Metric comment (# TYPE, # HELP)
|
50
|
+
parsed_metric = get_metric_name_from_comment(line)
|
51
|
+
if parsed_metric != ''
|
52
|
+
if parsed_metric != current_metric
|
53
|
+
# Starting a new metric comment block
|
54
|
+
new_metric = !@metrics.key?(parsed_metric)
|
55
|
+
if new_metric
|
56
|
+
@metrics[parsed_metric] = PrometheusMetrics.new()
|
57
|
+
end
|
58
|
+
current_metric = parsed_metric
|
59
|
+
end
|
60
|
+
|
61
|
+
if new_metric && parsed_metric == current_metric
|
62
|
+
# New metric, inject comments (# TYPE, # HELP)
|
63
|
+
@metrics[parsed_metric].add_comment(line)
|
64
|
+
end
|
65
|
+
end
|
66
|
+
else
|
67
|
+
# Metric value, simply append line
|
68
|
+
@metrics[current_metric].add_metric_value(line)
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
72
|
+
|
73
|
+
def get_metrics
|
74
|
+
@metrics.map{|k,v| v.to_string()}.join("\n") + (@metrics.length ? "\n" : "")
|
75
|
+
end
|
76
|
+
end
|
77
|
+
end
|
data/misc/prometheus_alerts.yaml
CHANGED
@@ -47,7 +47,7 @@ ALERT FluentdQueueLength
|
|
47
47
|
}
|
48
48
|
|
49
49
|
ALERT FluentdRecordsCountsHigh
|
50
|
-
IF sum(rate(
|
50
|
+
IF sum(rate(fluentd_output_status_emit_records{job="fluentd"}[5m])) BY (instance) > (3 * sum(rate(fluentd_output_status_emit_records{job="fluentd"}[15m])) BY (instance))
|
51
51
|
FOR 1m
|
52
52
|
LABELS {
|
53
53
|
service = "fluentd",
|
@@ -0,0 +1,138 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
require 'fluent/plugin/in_prometheus'
|
3
|
+
require 'fluent/test/driver/input'
|
4
|
+
|
5
|
+
require 'net/http'
|
6
|
+
|
7
|
+
describe Fluent::Plugin::PromMetricsAggregator do
|
8
|
+
|
9
|
+
metrics_worker_1 = %[# TYPE fluentd_status_buffer_queue_length gauge
|
10
|
+
# HELP fluentd_status_buffer_queue_length Current buffer queue length.
|
11
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="0",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
12
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="0",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
13
|
+
# TYPE fluentd_status_buffer_total_bytes gauge
|
14
|
+
# HELP fluentd_status_buffer_total_bytes Current total size of queued buffers.
|
15
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="0",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
16
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="0",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
17
|
+
# TYPE log_counter counter
|
18
|
+
# HELP log_counter the number of received logs
|
19
|
+
log_counter{worker_id="0",host="0123456789ab",tag="fluent.info"} 1.0
|
20
|
+
# HELP empty_metric A metric with no data
|
21
|
+
# TYPE empty_metric gauge
|
22
|
+
# HELP http_request_duration_seconds The HTTP request latencies in seconds.
|
23
|
+
# TYPE http_request_duration_seconds histogram
|
24
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.005"} 58
|
25
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.01"} 58
|
26
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.05"} 59
|
27
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.1"} 59
|
28
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="1"} 59
|
29
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="10"} 59
|
30
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="+Inf"} 59
|
31
|
+
http_request_duration_seconds_sum{code="200",worker_id="0",method="GET"} 0.05046115500000003
|
32
|
+
http_request_duration_seconds_count{code="200",worker_id="0",method="GET"} 59
|
33
|
+
]
|
34
|
+
|
35
|
+
metrics_worker_2 = %[# TYPE fluentd_output_status_buffer_queue_length gauge
|
36
|
+
# HELP fluentd_output_status_buffer_queue_length Current buffer queue length.
|
37
|
+
fluentd_output_status_buffer_queue_length{host="0123456789ab",worker_id="0",plugin_id="plugin-1",type="s3"} 0.0
|
38
|
+
fluentd_output_status_buffer_queue_length{host="0123456789ab",worker_id="0",plugin_id="plugin-2",type="s3"} 0.0
|
39
|
+
# TYPE fluentd_output_status_buffer_total_bytes gauge
|
40
|
+
# HELP fluentd_output_status_buffer_total_bytes Current total size of queued buffers.
|
41
|
+
fluentd_output_status_buffer_total_bytes{host="0123456789ab",worker_id="0",plugin_id="plugin-1",type="s3"} 0.0
|
42
|
+
fluentd_output_status_buffer_total_bytes{host="0123456789ab",worker_id="0",plugin_id="plugin-2",type="s3"} 0.0
|
43
|
+
]
|
44
|
+
|
45
|
+
metrics_worker_3 = %[# TYPE fluentd_status_buffer_queue_length gauge
|
46
|
+
# HELP fluentd_status_buffer_queue_length Current buffer queue length.
|
47
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="1",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
48
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="1",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
49
|
+
# TYPE fluentd_status_buffer_total_bytes gauge
|
50
|
+
# HELP fluentd_status_buffer_total_bytes Current total size of queued buffers.
|
51
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="1",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
52
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="1",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
53
|
+
# HELP http_request_duration_seconds The HTTP request latencies in seconds.
|
54
|
+
# TYPE http_request_duration_seconds histogram
|
55
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.005"} 70
|
56
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.01"} 70
|
57
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.05"} 71
|
58
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.1"} 71
|
59
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="1"} 71
|
60
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="10"} 71
|
61
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="+Inf"} 71
|
62
|
+
http_request_duration_seconds_sum{code="200",worker_id="1",method="GET"} 0.05646315600000003
|
63
|
+
http_request_duration_seconds_count{code="200",worker_id="1",method="GET"} 71
|
64
|
+
]
|
65
|
+
|
66
|
+
metrics_merged_1_and_3 = %[# TYPE fluentd_status_buffer_queue_length gauge
|
67
|
+
# HELP fluentd_status_buffer_queue_length Current buffer queue length.
|
68
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="0",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
69
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="0",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
70
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="1",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
71
|
+
fluentd_status_buffer_queue_length{host="0123456789ab",worker_id="1",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
72
|
+
# TYPE fluentd_status_buffer_total_bytes gauge
|
73
|
+
# HELP fluentd_status_buffer_total_bytes Current total size of queued buffers.
|
74
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="0",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
75
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="0",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
76
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="1",plugin_id="plugin-1",plugin_category="output",type="s3"} 0.0
|
77
|
+
fluentd_status_buffer_total_bytes{host="0123456789ab",worker_id="1",plugin_id="plugin-2",plugin_category="output",type="s3"} 0.0
|
78
|
+
# TYPE log_counter counter
|
79
|
+
# HELP log_counter the number of received logs
|
80
|
+
log_counter{worker_id="0",host="0123456789ab",tag="fluent.info"} 1.0
|
81
|
+
# HELP empty_metric A metric with no data
|
82
|
+
# TYPE empty_metric gauge
|
83
|
+
# HELP http_request_duration_seconds The HTTP request latencies in seconds.
|
84
|
+
# TYPE http_request_duration_seconds histogram
|
85
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.005"} 58
|
86
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.01"} 58
|
87
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.05"} 59
|
88
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="0.1"} 59
|
89
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="1"} 59
|
90
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="10"} 59
|
91
|
+
http_request_duration_seconds_bucket{code="200",worker_id="0",method="GET",le="+Inf"} 59
|
92
|
+
http_request_duration_seconds_sum{code="200",worker_id="0",method="GET"} 0.05046115500000003
|
93
|
+
http_request_duration_seconds_count{code="200",worker_id="0",method="GET"} 59
|
94
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.005"} 70
|
95
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.01"} 70
|
96
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.05"} 71
|
97
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="0.1"} 71
|
98
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="1"} 71
|
99
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="10"} 71
|
100
|
+
http_request_duration_seconds_bucket{code="200",worker_id="1",method="GET",le="+Inf"} 71
|
101
|
+
http_request_duration_seconds_sum{code="200",worker_id="1",method="GET"} 0.05646315600000003
|
102
|
+
http_request_duration_seconds_count{code="200",worker_id="1",method="GET"} 71
|
103
|
+
]
|
104
|
+
|
105
|
+
describe 'add_metrics' do
|
106
|
+
context '1st_metrics' do
|
107
|
+
it 'adds all fields' do
|
108
|
+
all_metrics = Fluent::Plugin::PromMetricsAggregator.new
|
109
|
+
all_metrics.add_metrics(metrics_worker_1)
|
110
|
+
result_str = all_metrics.get_metrics
|
111
|
+
|
112
|
+
expect(result_str).to eq(metrics_worker_1)
|
113
|
+
end
|
114
|
+
end
|
115
|
+
context '2nd_metrics' do
|
116
|
+
it 'append new metrics' do
|
117
|
+
all_metrics = Fluent::Plugin::PromMetricsAggregator.new
|
118
|
+
all_metrics.add_metrics(metrics_worker_1)
|
119
|
+
all_metrics.add_metrics(metrics_worker_2)
|
120
|
+
result_str = all_metrics.get_metrics
|
121
|
+
|
122
|
+
expect(result_str).to eq(metrics_worker_1 + metrics_worker_2)
|
123
|
+
end
|
124
|
+
end
|
125
|
+
|
126
|
+
context '3rd_metrics' do
|
127
|
+
it 'append existing metrics in the right place' do
|
128
|
+
all_metrics = Fluent::Plugin::PromMetricsAggregator.new
|
129
|
+
all_metrics.add_metrics(metrics_worker_1)
|
130
|
+
all_metrics.add_metrics(metrics_worker_2)
|
131
|
+
all_metrics.add_metrics(metrics_worker_3)
|
132
|
+
result_str = all_metrics.get_metrics
|
133
|
+
|
134
|
+
expect(result_str).to eq(metrics_merged_1_and_3 + metrics_worker_2)
|
135
|
+
end
|
136
|
+
end
|
137
|
+
end
|
138
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: fluent-plugin-prometheus
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.
|
4
|
+
version: 1.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Masahiro Sano
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2019-
|
11
|
+
date: 2019-09-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: fluentd
|
@@ -122,6 +122,7 @@ files:
|
|
122
122
|
- lib/fluent/plugin/in_prometheus_tail_monitor.rb
|
123
123
|
- lib/fluent/plugin/out_prometheus.rb
|
124
124
|
- lib/fluent/plugin/prometheus.rb
|
125
|
+
- lib/fluent/plugin/prometheus_metrics.rb
|
125
126
|
- misc/fluentd_sample.conf
|
126
127
|
- misc/nginx_proxy.conf
|
127
128
|
- misc/prometheus.yaml
|
@@ -129,6 +130,7 @@ files:
|
|
129
130
|
- spec/fluent/plugin/filter_prometheus_spec.rb
|
130
131
|
- spec/fluent/plugin/in_prometheus_monitor_spec.rb
|
131
132
|
- spec/fluent/plugin/out_prometheus_spec.rb
|
133
|
+
- spec/fluent/plugin/prometheus_metrics_spec.rb
|
132
134
|
- spec/fluent/plugin/prometheus_spec.rb
|
133
135
|
- spec/fluent/plugin/shared.rb
|
134
136
|
- spec/spec_helper.rb
|
@@ -151,7 +153,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
151
153
|
- !ruby/object:Gem::Version
|
152
154
|
version: '0'
|
153
155
|
requirements: []
|
154
|
-
|
156
|
+
rubyforge_project:
|
157
|
+
rubygems_version: 2.7.6
|
155
158
|
signing_key:
|
156
159
|
specification_version: 4
|
157
160
|
summary: A fluent plugin that collects metrics and exposes for Prometheus.
|
@@ -159,6 +162,7 @@ test_files:
|
|
159
162
|
- spec/fluent/plugin/filter_prometheus_spec.rb
|
160
163
|
- spec/fluent/plugin/in_prometheus_monitor_spec.rb
|
161
164
|
- spec/fluent/plugin/out_prometheus_spec.rb
|
165
|
+
- spec/fluent/plugin/prometheus_metrics_spec.rb
|
162
166
|
- spec/fluent/plugin/prometheus_spec.rb
|
163
167
|
- spec/fluent/plugin/shared.rb
|
164
168
|
- spec/spec_helper.rb
|